You're reading from Deep Learning with PyTorch Lightning

Product typeBook

Published inApr 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781800561618

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Author (1)

Kunal Sawarkar

Chapter 4: Ready-to-Cook Models from Lightning Flash

Building a Deep Learning (DL) model often involves recreating existing architectures or experiments from top-notch research papers in the field. For example, AlexNet was the winning Convolutional Neural Network (CNN) architecture in 2012 for the ImageNet computer vision challenge. Many data scientists have recreated that architecture for their business applications or built newer and better algorithms based on it. It is a common practice to reuse existing experiments on your data before conducting your own experiments. Doing so typically involves either reading the original research paper to code it or tapping into the author's GitHub page to gain an understanding of what's what, which are both time-consuming options. What if the most popular architectures and experiments in DL were easily available for executing various common DL tasks as part of a framework? Meet PyTorch Lightning Flash!

Flash provides out-of-the-box...

Technical requirements

The code for this chapter has been developed and tested on a macOS with Anaconda or in Google Colab and with Python 3.6. If you are using another environment, please make appropriate changes to your environment variables.

In this chapter, we will primarily be using the following Python modules, mentioned with their versions:

PyTorch Lightning (version 1.5.10)
Flash (version 0.7.1)
Seaborn (version 0.11.2)
NumPy (version 1.21.5)
Torch (version 1.10.0)
pandas (version 1.3.5)

Working examples for this chapter can be found at this GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning/tree/main/Chapter04.

The source datasets can be found at the Kinetics 400 dataset source: https://deepmind.com/research/open-source/kinetics.

This is the video classification dataset that has been created by DeepMind by scraping YouTube videos. The kinetics dataset was made available by Google Inc. and is one...

Getting started with Lightning Flash

Imagine you are in the mood to eat Indian food. There are various ways you can go about cooking it. You can get all the veggies, the flour to make dough, and the all-important spices, which you then crush in the right quantities one by one. Once ready, you can cook it by following the proper process. Needless to say, doing so requires immense knowledge of spices and which one goes into which curry, in what quantity, in what sequence, and how long it needs to be cooked.

If you think you are not so much of an expert, the second option is to use ready-to-use spices (such as chicken tikka masala or biryani masala) and just add them to your raw ingredients and cook them. While this definitely simplifies cooking than the first step, this still requires a bit of cooking, but without worrying too much about the nitty-gritty, you can still get good results.

But even the second option is a bit time-consuming, and if you want to get it quickly, then...

Flash is as simple as 1-2-3

We started the book by creating the first DL model in the form of CNN. We then used transfer learning to see that we can get higher accuracy by using representations learned on popular datasets and train models even quicker. Lightning Flash takes it to another level by providing a standardized framework for you to quickly access all the pre-trained model architectures as well as some popular datasets.

Using Flash means writing some of the most minimal forms of code to train a DL model. In fact, a simple Flash model can be as lightweight as five lines of code.

Once the libraries are imported, we only have to perform three basic steps:

Supply your data: Create a data module to provide data to the framework:
```
datamodule = yourData.from_json(
    "yourFile",
    "text",
```
Define your task and backbone: Now, it's time to define what you want to do with the data. You can select from...

Video classification using Flash

Video classification is one of the most interesting yet challenging problems in DL. Simply speaking, it tries to classify an action in a video clip and recognize it (such as walking, bowling, or golfing):

Figure 4.1 – The Kinetics human action video dataset released by DeepMind is comprised of annotated ~10-second video clips sourced from YouTube

Training such a DL model is a challenging problem because of the sheer amount of compute power it takes to train the model, given the large size of video files compared to tabular or image data. Using a pre-trained model and architecture is a great way to start your experiments for video classification.

PyTorch Lightning Flash relies internally on the PyTorchVideo library for its backbone. PyTorchVideo caters to the ecosystem of video understanding. Lightning Flash makes it easy by creating the predefined and configurable hooks into the underlying framework. There are hooks...

Automatic speech recognition using Flash

Recognizing speech from an audio file is perhaps one of the most widely used applications of AI. It's part of smartphone speakers such as Alexa, as well as automatically generated captions for video streaming platforms such as YouTube, and also many music platforms. It can detect speech in an audio file and convert it into text. Detection of speech involves various challenges such as speaker modalities, pitch, and pronunciation, as well as dialect and language itself:

Figure 4.6 – A concept of automatic speech recognition

To train a model for Automatic Speech Recognition (ASR), we need a training dataset that is a collection of audio files along with the corresponding text transcription that describes that audio. The more diverse the set of audio files with people from different age groups, ethnicities, dialects, and so on is, the more robust the ASR model will be for the unseen audio files.

In the previous...

Further learning

Other languages: The ASR dataset from which we used the Scottish language dataset also contains many other languages, such as Sinhala, and many Indian languages, such as Hindi, Marathi, and Bengali. The next logical step would be to try this ASR model for another language and compare the results. It is also a great way to learn how to manage training requirements as some of the audio files in these datasets are bigger; hence, they will need more compute power.

Many non-English languages don't have apps widely available on mobiles (for example, the Marathi language spoken in India) and a lack of technical tools in native languages limits the adoption of many tools in remote parts of the world. Creating an ASR in your local language can add great value to the technical ecosystem as well.

Audio and video together: Another interesting task is to combine the audio speech recognition and video classification tasks that we have seen today and use...

Summary

Lightning Flash is still in the early stages of development and will continue to evolve rapidly. Flash is also a community project where model code can be contributed by data science practitioners, and so the quality of code may vary from architecture to architecture. We advise you to follow due diligence when it comes to the source of any model code, as it may not always be from the PyTorch Lightning team; try to avoid bugs.

However, Flash is extremely useful, whether you are a beginner in DL or an advanced practitioner looking to establish a baseline for a new project. The first point of order is to start with the latest and greatest architecture in the field. It helps get you off the ground easily with your dataset and sets the baseline for the different algorithms of your use case. Flash, with its out-of-the-box capability for state-of-the-art DL architectures, is not just a timesaver but a big productivity booster.

Vision neural networks are widely used and are...

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning with PyTorch Lightning

Published in: Apr 2022Publisher: PacktISBN-13: 9781800561618

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Kunal Sawarkar

Kunal Sawarkar is a chief data scientist and AI thought leader. He leads the worldwide partner ecosystem in building innovative AI products. He also serves as an advisory board member and an angel investor. He holds a master's degree from Harvard University with major coursework in applied statistics. He has been applying machine learning to solve previously unsolved problems in industry and society, with a special focus on deep learning and self-supervised learning. Kunal has led various AI product R&D labs and has 20+ patents and papers published in this field. When not diving into data, he loves doing rock climbing and learning to fly aircraft, in addition to an insatiable curiosity for astronomy and wildlife.
Read more about Kunal Sawarkar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages