Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering PyTorch

You're reading from  Mastering PyTorch

Product type Book
Published in Feb 2021
Publisher Packt
ISBN-13 9781789614381
Pages 450 pages
Edition 1st Edition
Languages
Author (1):
Ashish Ranjan Jha Ashish Ranjan Jha
Profile icon Ashish Ranjan Jha

Table of Contents (20) Chapters

Preface Section 1: PyTorch Overview
Chapter 1: Overview of Deep Learning using PyTorch Chapter 2: Combining CNNs and LSTMs Section 2: Working with Advanced Neural Network Architectures
Chapter 3: Deep CNN Architectures Chapter 4: Deep Recurrent Model Architectures Chapter 5: Hybrid Advanced Models Section 3: Generative Models and Deep Reinforcement Learning
Chapter 6: Music and Text Generation with PyTorch Chapter 7: Neural Style Transfer Chapter 8: Deep Convolutional GANs Chapter 9: Deep Reinforcement Learning Section 4: PyTorch in Production Systems
Chapter 10: Operationalizing PyTorch Models into Production Chapter 11: Distributed Training Chapter 12: PyTorch and AutoML Chapter 13: PyTorch and Explainable AI Chapter 14: Rapid Prototyping with PyTorch Other Books You May Enjoy

Chapter 11: Distributed Training

Before serving pre-trained machine learning models, which we discussed extensively in the previous chapter, we need to train our machine learning models. In Chapter 3, Deep CNN Architectures; Chapter 4, Deep Recurrent Model Architectures; and Chapter 5, Hybrid Advanced Models, we have seen the vast expanse of increasingly complex deep learning model architectures.

Such gigantic models often have millions and even billions of parameters. The recent (at the time of writing) Generative Pre-Trained Transformer 3 (GPT3) language model has 175 billion parameters. Using backpropagation to tune many parameters requires enormous amounts of memory and compute power. And even then, model training can take days to finish.

In this chapter, we will explore ways of speeding up the model training process by distributing the training task across machines and processes within machines. We will learn about the distributed training APIs offered by PyTorch –...

Technical requirements

We will be using Python scripts for all our exercises. The following is a list of Python libraries that must be installed for this chapter using pip. For example, run pip install torch==1.4.0 on the command line, like so to install torch:

jupyter==1.0.0
torch==1.4.0
torchvision==0.5.0

All the code files that are relevant to this chapter are available at https://github.com/PacktPublishing/Mastering-PyTorch/tree/master/Chapter11.

Distributed training with PyTorch

In the previous exercises in this book, we have implicitly assumed that model training happens in one machine and in a single Python process in that machine. In this section, we will revisit the exercise from Chapter 1, Overview of Deep Learning Using PyTorch – the handwritten digit classification model – and transform the model training routine from regular training into distributed training. While doing so, we will explore the tools PyTorch offers for distributing the training process, thereby making it both faster and more hardware-efficient.

First, let's look at how the MNIST model can be trained without using distributed training. We will then contrast this with a distributed training PyTorch pipeline.

Training the MNIST model in a regular fashion

The handwritten digits classification model that we built in Chapter 1, Overview of Deep Learning Using Python, was in the form of a Jupyter notebook. Here, we will put that...

Distributed training on GPUs with CUDA

Throughout the various exercises in this book, you may have noticed a common line of PyTorch code:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code simply looks for the available compute device and prefers cuda (which uses the GPU) over cpu. This preference is because of the computational speedups that GPUs can provide on regular neural network operations, such as matrix multiplications and additions through parallelization.

In this section, we will learn how to speed this up further with the help of distributed training on GPUs. We will build upon the work done in the previous exercise. Note that most of the code looks the same. In the following steps, we will highlight the changes. Executing the script has been left to you as an exercise. The full code is available here: https://github.com/PacktPublishing/Mastering-PyTorch/blob/master/Chapter11/convnet_distributed_cuda.py. Let&apos...

Summary

In this chapter, we covered an important practical aspect of machine learning; that is, how to optimize the model training process. We explored the extent and power of distributed training using PyTorch. First, we discussed distributed training on CPUs. We re-trained the model we trained in Chapter 1, Overview of Deep Learning Using PyTorch, using the principles of distributed training.

While working on this exercise, we learned about some of the useful PyTorch APIs that make distributed training work once we've made a few code changes. Finally, we ran the new training script and observed a significant speedup by distributing the training across multiple processes.

In the second half of this chapter, we briefly discussed distributed training on GPUs using PyTorch. We highlighted the basic code changes needed for model training to work on multiple GPUs in a distributed fashion, while leaving out the actual execution for you as an exercise.

In the next chapter,...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering PyTorch
Published in: Feb 2021 Publisher: Packt ISBN-13: 9781789614381
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}