Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Deep Learning with PyTorch Lightning

You're reading from  Deep Learning with PyTorch Lightning

Product type Book
Published in Apr 2022
Publisher Packt
ISBN-13 9781800561618
Pages 366 pages
Edition 1st Edition
Languages
Author (1):
Kunal Sawarkar Kunal Sawarkar
Profile icon Kunal Sawarkar

Table of Contents (15) Chapters

Preface Section 1: Kickstarting with PyTorch Lightning
Chapter 1: PyTorch Lightning Adventure Chapter 2: Getting off the Ground with the First Deep Learning Model Chapter 3: Transfer Learning Using Pre-Trained Models Chapter 4: Ready-to-Cook Models from Lightning Flash Section 2: Solving using PyTorch Lightning
Chapter 5: Time Series Models Chapter 6: Deep Generative Models Chapter 7: Semi-Supervised Learning Chapter 8: Self-Supervised Learning Section 3: Advanced Topics
Chapter 9: Deploying and Scoring Models Chapter 10: Scaling and Managing Training Other Books You May Enjoy

Chapter 10: Scaling and Managing Training

So far, we have been on an exciting journey in the realm of Deep Learning (DL). We have learned how to recognize images, how to create new images or generate new texts, and how to train machines without fully labeled sets. It's an open secret that achieving good results for a DL model requires a massive amount of compute power, often requiring the help of a Graphics Processing Unit (GPU). We have come a long way since the early days of DL when data scientists had to manually distribute the training to each node of the GPU. PyTorch Lightning obfuscates most of the complexities associated with managing underlying hardware or pushing down training to the GPU.

In the earlier chapters, we have pushed down training via brute force. However, doing so is not practical when you have to deal with a massive training effort for large-scale data. In this chapter, we will take a nuanced view of the challenges of training a model at scale and managing...

Technical Requirements

In this chapter, we will be using version 1.5.2 of PyTorch Lightning. Please install this version using below command

!pip install pytorch-lightning==1.5.2

Managing training

In this section, we will go through some of the common challenges that you may encounter while managing the training of DL models. This includes troubleshooting in terms of saving model parameters and debugging the model logic efficiently.

Saving model hyperparameters

There is often a need to save the model's hyperparameters. A few reasons are reproducibility, consistency, and that some models' network architecture are extremely sensitive to hyperparameters.

On more than one occasion, you may find yourself being unable to load the model from the checkpoint. The load_from_checkpoint method of the LightningModule class fails with an error.

Solution

A checkpoint is nothing more than a saved state of the model. Checkpoints contain precise values of all parameters used by the model. However, hyperparameter arguments passed to the __init__ model are not saved in the checkpoint by default. Calling self.save_hyperparameters inside __init__ of the...

Scaling up training

Scaling up training requires us to speed up the training process for large amounts of data and utilize GPUs and TPUs better. In this section, we will cover some of the tips on how to efficiently use provisions in PyTorch Lightning to accomplish this.

Speeding up model training using a number of workers

How can the PyTorch Lightning framework help speed up model training? One useful parameter to know is num_workers, which comes from PyTorch, and PyTorch Lightning builds on top of it by giving advice about the number of workers.

Solution

The PyTorch Lightning framework offers a number of provisions for speeding up model training, such as the following:

  • You can set a non-zero value for the num_workers argument to speed up model training. The following code snippet provides an example of this:
    import torch.utils.data as data
    ...
    dataloader = data.DataLoader(num_workers=4, ...)

The optimal num_workers value depends on the batch size and configuration...

Controlling training

There is often a need to have an audit, balance, and control mechanism during the training process. Imagine you are training a model for 1,000 epochs and a network failure causes an interruption after 500 epochs. How do you resume training from a certain point while ensuring that you won't lose all your progress, or save a model checkpoint from a cloud environment? Let's see how to deal with these practical challenges that are often part and parcel of an engineer's life.

Saving model checkpoints when using the cloud

Notebooks hosted in cloud environments such as Google Colab have resource limits and idle timeout periods. If these limits are exceeded during the development of a model, then the notebook is deactivated. Owing to the inherently elastic nature of the cloud environment, (which is one of the value propositions of the cloud) the underlying compute and storage resources are decommissioned when a notebook is deactivated. If you refresh...

Further reading

We have mentioned some key tips and tricks that we have found useful for common troubleshooting. You can always refer to the Speed up model training documentation for more details on how to speed up training or on other topics. Here is a link to the documentation: https://pytorch-lightning.readthedocs.io/en/latest/guides/speed.html.

We have described how PyTorch Lightning supports the TensorBoard logging framework by default. Here is a link to the TensorBoard website: https://www.tensorflow.org/tensorboard.

Additionally, PyTorch Lightning supports CometLogger, CSVLogger, MLflowLogger, and other logging frameworks. You can refer to the Logging documentation for details of how those other logger types can be enabled. Here is a link to the documentation: https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html.

Summary

We began this book with just a curiosity for what DL and PyTorch Lightning are. Anyone new to the Deep Learning or a curious beginner to PyTorch Lightning can get their feet wet by trying simple image recognition models and then continue to raise their game by learning skills such as Transfer Learning (TL) or how to make use of other pre-trained architectures. We continued to leverage the PyTorch Lightning framework for doing not just image recognition models but also Natural Language Processing (NLP) models, time series, and other traditional Machine Learning (ML) challenges. Along the way, we learned about RNN, LSTM, and Transformers.

In the next section of the book, we explored exotic DL models such as Generative Adversarial Networks (GANs), Semi-supervised learning, and Self-Supervised Learning that expand the art of what is possible in the domain of ML and these are not just advanced models but super cool ways to create art and lots of fun to work with. We wrapped...

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with PyTorch Lightning
Published in: Apr 2022 Publisher: Packt ISBN-13: 9781800561618
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}