You're reading from Mastering PyTorch

Product type Book

Published in Feb 2021

Publisher Packt

ISBN-13 9781789614381

Pages 450 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Ashish Ranjan Jha

Table of Contents (20) Chapters

Preface

Section 1: PyTorch Overview

Chapter 1: Overview of Deep Learning using PyTorch

Chapter 2: Combining CNNs and LSTMs

Section 2: Working with Advanced Neural Network Architectures

Chapter 3: Deep CNN Architectures

Chapter 4: Deep Recurrent Model Architectures

Chapter 5: Hybrid Advanced Models

Section 3: Generative Models and Deep Reinforcement Learning

Chapter 6: Music and Text Generation with PyTorch

Chapter 7: Neural Style Transfer

Chapter 8: Deep Convolutional GANs

Chapter 9: Deep Reinforcement Learning

Section 4: PyTorch in Production Systems

Chapter 10: Operationalizing PyTorch Models into Production

Chapter 11: Distributed Training

Chapter 12: PyTorch and AutoML

Chapter 13: PyTorch and Explainable AI

Chapter 14: Rapid Prototyping with PyTorch

Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 11: Distributed Training

Before serving pre-trained machine learning models, which we discussed extensively in the previous chapter, we need to train our machine learning models. In Chapter 3, Deep CNN Architectures; Chapter 4, Deep Recurrent Model Architectures; and Chapter 5, Hybrid Advanced Models, we have seen the vast expanse of increasingly complex deep learning model architectures.

Such gigantic models often have millions and even billions of parameters. The recent (at the time of writing) Generative Pre-Trained Transformer 3 (GPT3) language model has 175 billion parameters. Using backpropagation to tune many parameters requires enormous amounts of memory and compute power. And even then, model training can take days to finish.

In this chapter, we will explore ways of speeding up the model training process by distributing the training task across machines and processes within machines. We will learn about the distributed training APIs offered by PyTorch –...

Technical requirements

We will be using Python scripts for all our exercises. The following is a list of Python libraries that must be installed for this chapter using pip. For example, run pip install torch==1.4.0 on the command line, like so to install torch:

jupyter==1.0.0
torch==1.4.0
torchvision==0.5.0

All the code files that are relevant to this chapter are available at https://github.com/PacktPublishing/Mastering-PyTorch/tree/master/Chapter11.

Distributed training with PyTorch

In the previous exercises in this book, we have implicitly assumed that model training happens in one machine and in a single Python process in that machine. In this section, we will revisit the exercise from Chapter 1, Overview of Deep Learning Using PyTorch – the handwritten digit classification model – and transform the model training routine from regular training into distributed training. While doing so, we will explore the tools PyTorch offers for distributing the training process, thereby making it both faster and more hardware-efficient.

First, let's look at how the MNIST model can be trained without using distributed training. We will then contrast this with a distributed training PyTorch pipeline.

Training the MNIST model in a regular fashion

The handwritten digits classification model that we built in Chapter 1, Overview of Deep Learning Using Python, was in the form of a Jupyter notebook. Here, we will put that...

Distributed training on GPUs with CUDA

Throughout the various exercises in this book, you may have noticed a common line of PyTorch code:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code simply looks for the available compute device and prefers cuda (which uses the GPU) over cpu. This preference is because of the computational speedups that GPUs can provide on regular neural network operations, such as matrix multiplications and additions through parallelization.

In this section, we will learn how to speed this up further with the help of distributed training on GPUs. We will build upon the work done in the previous exercise. Note that most of the code looks the same. In the following steps, we will highlight the changes. Executing the script has been left to you as an exercise. The full code is available here: https://github.com/PacktPublishing/Mastering-PyTorch/blob/master/Chapter11/convnet_distributed_cuda.py. Let&apos...

Summary

In this chapter, we covered an important practical aspect of machine learning; that is, how to optimize the model training process. We explored the extent and power of distributed training using PyTorch. First, we discussed distributed training on CPUs. We re-trained the model we trained in Chapter 1, Overview of Deep Learning Using PyTorch, using the principles of distributed training.

While working on this exercise, we learned about some of the useful PyTorch APIs that make distributed training work once we've made a few code changes. Finally, we ran the new training script and observed a significant speedup by distributing the training across multiple processes.

In the second half of this chapter, we briefly discussed distributed training on GPUs using PyTorch. We highlighted the basic code changes needed for model training to work on multiple GPUs in a distributed fashion, while leaving out the actual execution for you as an exercise.

In the next chapter,...