Reader small image

You're reading from  Deep Learning with PyTorch Lightning

Product typeBook
Published inApr 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800561618
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Kunal Sawarkar
Kunal Sawarkar
author image
Kunal Sawarkar

Kunal Sawarkar is a chief data scientist and AI thought leader. He leads the worldwide partner ecosystem in building innovative AI products. He also serves as an advisory board member and an angel investor. He holds a master's degree from Harvard University with major coursework in applied statistics. He has been applying machine learning to solve previously unsolved problems in industry and society, with a special focus on deep learning and self-supervised learning. Kunal has led various AI product R&D labs and has 20+ patents and papers published in this field. When not diving into data, he loves doing rock climbing and learning to fly aircraft, in addition to an insatiable curiosity for astronomy and wildlife.
Read more about Kunal Sawarkar

Right arrow

Chapter 4: Ready-to-Cook Models from Lightning Flash

Building a Deep Learning (DL) model often involves recreating existing architectures or experiments from top-notch research papers in the field. For example, AlexNet was the winning Convolutional Neural Network (CNN) architecture in 2012 for the ImageNet computer vision challenge. Many data scientists have recreated that architecture for their business applications or built newer and better algorithms based on it. It is a common practice to reuse existing experiments on your data before conducting your own experiments. Doing so typically involves either reading the original research paper to code it or tapping into the author's GitHub page to gain an understanding of what's what, which are both time-consuming options. What if the most popular architectures and experiments in DL were easily available for executing various common DL tasks as part of a framework? Meet PyTorch Lightning Flash!

Flash provides out-of-the-box...

Technical requirements

The code for this chapter has been developed and tested on a macOS with Anaconda or in Google Colab and with Python 3.6. If you are using another environment, please make appropriate changes to your environment variables.

In this chapter, we will primarily be using the following Python modules, mentioned with their versions:

  • PyTorch Lightning (version 1.5.10)
  • Flash (version 0.7.1)
  • Seaborn (version 0.11.2)
  • NumPy (version 1.21.5)
  • Torch (version 1.10.0)
  • pandas (version 1.3.5)

Working examples for this chapter can be found at this GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning/tree/main/Chapter04.

The source datasets can be found at the Kinetics 400 dataset source: https://deepmind.com/research/open-source/kinetics.

This is the video classification dataset that has been created by DeepMind by scraping YouTube videos. The kinetics dataset was made available by Google Inc. and is one...

Getting started with Lightning Flash

Imagine you are in the mood to eat Indian food. There are various ways you can go about cooking it. You can get all the veggies, the flour to make dough, and the all-important spices, which you then crush in the right quantities one by one. Once ready, you can cook it by following the proper process. Needless to say, doing so requires immense knowledge of spices and which one goes into which curry, in what quantity, in what sequence, and how long it needs to be cooked.

If you think you are not so much of an expert, the second option is to use ready-to-use spices (such as chicken tikka masala or biryani masala) and just add them to your raw ingredients and cook them. While this definitely simplifies cooking than the first step, this still requires a bit of cooking, but without worrying too much about the nitty-gritty, you can still get good results.

But even the second option is a bit time-consuming, and if you want to get it quickly, then...

Flash is as simple as 1-2-3

We started the book by creating the first DL model in the form of CNN. We then used transfer learning to see that we can get higher accuracy by using representations learned on popular datasets and train models even quicker. Lightning Flash takes it to another level by providing a standardized framework for you to quickly access all the pre-trained model architectures as well as some popular datasets.

Using Flash means writing some of the most minimal forms of code to train a DL model. In fact, a simple Flash model can be as lightweight as five lines of code.

Once the libraries are imported, we only have to perform three basic steps:

  1. Supply your data: Create a data module to provide data to the framework:
    datamodule = yourData.from_json(
        "yourFile",
        "text",
  2. Define your task and backbone: Now, it's time to define what you want to do with the data. You can select from...

Video classification using Flash

Video classification is one of the most interesting yet challenging problems in DL. Simply speaking, it tries to classify an action in a video clip and recognize it (such as walking, bowling, or golfing):

Figure 4.1 – The Kinetics human action video dataset released by DeepMind is comprised of annotated ~10-second video clips sourced from YouTube

Training such a DL model is a challenging problem because of the sheer amount of compute power it takes to train the model, given the large size of video files compared to tabular or image data. Using a pre-trained model and architecture is a great way to start your experiments for video classification.

PyTorch Lightning Flash relies internally on the PyTorchVideo library for its backbone. PyTorchVideo caters to the ecosystem of video understanding. Lightning Flash makes it easy by creating the predefined and configurable hooks into the underlying framework. There are hooks...

Automatic speech recognition using Flash

Recognizing speech from an audio file is perhaps one of the most widely used applications of AI. It's part of smartphone speakers such as Alexa, as well as automatically generated captions for video streaming platforms such as YouTube, and also many music platforms. It can detect speech in an audio file and convert it into text. Detection of speech involves various challenges such as speaker modalities, pitch, and pronunciation, as well as dialect and language itself:

Figure 4.6 – A concept of automatic speech recognition

To train a model for Automatic Speech Recognition (ASR), we need a training dataset that is a collection of audio files along with the corresponding text transcription that describes that audio. The more diverse the set of audio files with people from different age groups, ethnicities, dialects, and so on is, the more robust the ASR model will be for the unseen audio files.

In the previous...

Further learning

  • Other languages: The ASR dataset from which we used the Scottish language dataset also contains many other languages, such as Sinhala, and many Indian languages, such as Hindi, Marathi, and Bengali. The next logical step would be to try this ASR model for another language and compare the results. It is also a great way to learn how to manage training requirements as some of the audio files in these datasets are bigger; hence, they will need more compute power.

Many non-English languages don't have apps widely available on mobiles (for example, the Marathi language spoken in India) and a lack of technical tools in native languages limits the adoption of many tools in remote parts of the world. Creating an ASR in your local language can add great value to the technical ecosystem as well.

  • Audio and video together: Another interesting task is to combine the audio speech recognition and video classification tasks that we have seen today and use...

Summary

Lightning Flash is still in the early stages of development and will continue to evolve rapidly. Flash is also a community project where model code can be contributed by data science practitioners, and so the quality of code may vary from architecture to architecture. We advise you to follow due diligence when it comes to the source of any model code, as it may not always be from the PyTorch Lightning team; try to avoid bugs.

However, Flash is extremely useful, whether you are a beginner in DL or an advanced practitioner looking to establish a baseline for a new project. The first point of order is to start with the latest and greatest architecture in the field. It helps get you off the ground easily with your dataset and sets the baseline for the different algorithms of your use case. Flash, with its out-of-the-box capability for state-of-the-art DL architectures, is not just a timesaver but a big productivity booster.

Vision neural networks are widely used and are...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with PyTorch Lightning
Published in: Apr 2022Publisher: PacktISBN-13: 9781800561618
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kunal Sawarkar

Kunal Sawarkar is a chief data scientist and AI thought leader. He leads the worldwide partner ecosystem in building innovative AI products. He also serves as an advisory board member and an angel investor. He holds a master's degree from Harvard University with major coursework in applied statistics. He has been applying machine learning to solve previously unsolved problems in industry and society, with a special focus on deep learning and self-supervised learning. Kunal has led various AI product R&D labs and has 20+ patents and papers published in this field. When not diving into data, he loves doing rock climbing and learning to fly aircraft, in addition to an insatiable curiosity for astronomy and wildlife.
Read more about Kunal Sawarkar