You're reading from Deep Learning with PyTorch Lightning

Product type Book

Published in Apr 2022

Publisher Packt

ISBN-13 9781800561618

Pages 366 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Kunal Sawarkar

Table of Contents (15) Chapters

Preface

Section 1: Kickstarting with PyTorch Lightning

Chapter 1: PyTorch Lightning Adventure

Chapter 2: Getting off the Ground with the First Deep Learning Model

Chapter 3: Transfer Learning Using Pre-Trained Models

Chapter 4: Ready-to-Cook Models from Lightning Flash

Section 2: Solving using PyTorch Lightning

Chapter 5: Time Series Models

Chapter 6: Deep Generative Models

Chapter 7: Semi-Supervised Learning

Chapter 8: Self-Supervised Learning

Section 3: Advanced Topics

Chapter 9: Deploying and Scoring Models

Chapter 10: Scaling and Managing Training

Other Books You May Enjoy

Chapter 10: Scaling and Managing Training

So far, we have been on an exciting journey in the realm of Deep Learning (DL). We have learned how to recognize images, how to create new images or generate new texts, and how to train machines without fully labeled sets. It's an open secret that achieving good results for a DL model requires a massive amount of compute power, often requiring the help of a Graphics Processing Unit (GPU). We have come a long way since the early days of DL when data scientists had to manually distribute the training to each node of the GPU. PyTorch Lightning obfuscates most of the complexities associated with managing underlying hardware or pushing down training to the GPU.

In the earlier chapters, we have pushed down training via brute force. However, doing so is not practical when you have to deal with a massive training effort for large-scale data. In this chapter, we will take a nuanced view of the challenges of training a model at scale and managing...

Technical Requirements

In this chapter, we will be using version 1.5.2 of PyTorch Lightning. Please install this version using below command

!pip install pytorch-lightning==1.5.2

Managing training

In this section, we will go through some of the common challenges that you may encounter while managing the training of DL models. This includes troubleshooting in terms of saving model parameters and debugging the model logic efficiently.

Saving model hyperparameters

There is often a need to save the model's hyperparameters. A few reasons are reproducibility, consistency, and that some models' network architecture are extremely sensitive to hyperparameters.

On more than one occasion, you may find yourself being unable to load the model from the checkpoint. The load_from_checkpoint method of the LightningModule class fails with an error.

Solution

A checkpoint is nothing more than a saved state of the model. Checkpoints contain precise values of all parameters used by the model. However, hyperparameter arguments passed to the __init__ model are not saved in the checkpoint by default. Calling self.save_hyperparameters inside __init__ of the...

Scaling up training

Scaling up training requires us to speed up the training process for large amounts of data and utilize GPUs and TPUs better. In this section, we will cover some of the tips on how to efficiently use provisions in PyTorch Lightning to accomplish this.

Speeding up model training using a number of workers

How can the PyTorch Lightning framework help speed up model training? One useful parameter to know is num_workers, which comes from PyTorch, and PyTorch Lightning builds on top of it by giving advice about the number of workers.

Solution

The PyTorch Lightning framework offers a number of provisions for speeding up model training, such as the following:

You can set a non-zero value for the num_workers argument to speed up model training. The following code snippet provides an example of this:
```
import torch.utils.data as data
...
dataloader = data.DataLoader(num_workers=4, ...)
```

The optimal num_workers value depends on the batch size and configuration...

Controlling training

There is often a need to have an audit, balance, and control mechanism during the training process. Imagine you are training a model for 1,000 epochs and a network failure causes an interruption after 500 epochs. How do you resume training from a certain point while ensuring that you won't lose all your progress, or save a model checkpoint from a cloud environment? Let's see how to deal with these practical challenges that are often part and parcel of an engineer's life.

Saving model checkpoints when using the cloud

Notebooks hosted in cloud environments such as Google Colab have resource limits and idle timeout periods. If these limits are exceeded during the development of a model, then the notebook is deactivated. Owing to the inherently elastic nature of the cloud environment, (which is one of the value propositions of the cloud) the underlying compute and storage resources are decommissioned when a notebook is deactivated. If you refresh...

Summary

We began this book with just a curiosity for what DL and PyTorch Lightning are. Anyone new to the Deep Learning or a curious beginner to PyTorch Lightning can get their feet wet by trying simple image recognition models and then continue to raise their game by learning skills such as Transfer Learning (TL) or how to make use of other pre-trained architectures. We continued to leverage the PyTorch Lightning framework for doing not just image recognition models but also Natural Language Processing (NLP) models, time series, and other traditional Machine Learning (ML) challenges. Along the way, we learned about RNN, LSTM, and Transformers.

In the next section of the book, we explored exotic DL models such as Generative Adversarial Networks (GANs), Semi-supervised learning, and Self-Supervised Learning that expand the art of what is possible in the domain of ML and these are not just advanced models but super cool ways to create art and lots of fun to work with. We wrapped...

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.