You're reading from Deep Learning with PyTorch Lightning

Product type Book

Published in Apr 2022

Publisher Packt

ISBN-13 9781800561618

Pages 366 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Kunal Sawarkar

Table of Contents (15) Chapters

Preface

Section 1: Kickstarting with PyTorch Lightning

Chapter 1: PyTorch Lightning Adventure

Chapter 2: Getting off the Ground with the First Deep Learning Model

Chapter 3: Transfer Learning Using Pre-Trained Models

Chapter 4: Ready-to-Cook Models from Lightning Flash

Section 2: Solving using PyTorch Lightning

Chapter 5: Time Series Models

Chapter 6: Deep Generative Models

Chapter 7: Semi-Supervised Learning

Chapter 8: Self-Supervised Learning

Section 3: Advanced Topics

Chapter 9: Deploying and Scoring Models

Chapter 10: Scaling and Managing Training

Other Books You May Enjoy

Chapter 7: Semi-Supervised Learning

Machine learning has been used for a long time to recognize patterns. However, recently the idea that machines can be used to create patterns has caught the imagination of everyone. The idea of machines being able to create art by mimicking known artistic styles or, given any input, provide a human-like perspective as output has become the new frontier in machine learning.

Most of the Deep Learning models we have seen thus far have been either about recognizing images (using the Convolutional Neural Network (CNN) architecture), generating text (with Transformers), or generating images (Generative Adversarial Networks). However, we as humans don't always view objects purely as text or images in real life but rather as a combination of them. For example, an image in a Facebook post or a news article will likely be accompanied by some comments describing it. Memes are a popular way of creating humor by combining catchy images with smart text...

Technical requirements

In this chapter, we will primarily be using the following Python modules listed with their versions:

PyTorch Lightning (version 1.5.2)
NumPy (version 1.19.5)
torch (version 1.10)
torchvision (version 0.10.0)
NLTK (version 3.2.5)
Matplotlib (version 3.2.2)

In order to make sure that these modules work together and not go out of sync, we have used the specific version of torch, torchvision, torchtext, torchaudio with PyTorch Lightning 1.5.2. You can also use the latest version of PyTorch Lightning and torch compatible with each other. More details can be found on the GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning.

!pip install torch==1.10.0 torchvision==0.11.1 torchtext==0.11.0 torchaudio==0.10.0 --quiet
!pip install pytorch-lightning==1.5.2 --quiet

Working code examples for this chapter can be found at this GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning...

Getting started with semi-supervised learning

As we saw in the introduction, one of the most amazing applications of semi-supervised learning is the possibility to teach machines how to interpret images. This can be done not just to create captions for some given images but also to ask the machine to write a poetic description of how it perceives the images.

Check out the following results. On the left are some random images passed to the model and on the right are some poems generated by the model. The following results are interesting, as it is hard to identify whether these lyrical stanzas were created by a machine or a human:

Figure 7.1 – Generating poems for a given image by analyzing context

For example, in the top image, the machine could detect the door and street and wrote a stanza about it. In the second image, it detected sunshine and wrote a lyrical stanza about sunsets and love. In the bottom image, the machine detected a couple...

Going through the CNN–RNN architecture

While there are many possible applications of semi-supervised learning and a number of possible neural architectures, we will start with one of the most popular, which is an architecture that combines CNN and RNN.

Simply put, we will be starting with an image, then use the CNN to recognize the image, and then pass the output of the CNN to an RNN, which in turn generates the text:

Figure 7.2 – CNN–RNN cascaded architecture

Intuitively speaking, the model is trained to recognize the images and their sentence descriptions so that it learns about the intermodal correspondence between language and visual data. It uses a CNN and a multimodal RNN to generate descriptions of the images. As mentioned above, LSTM is used for the implementation of the RNN.

This architecture was first proposed by Andrej Karpathy and his doctoral advisor Fei-Fei Li in their 2015 Stanford paper titled Generative Text Using...

Generating captions for images

This model will involve the following steps:

Downloading the dataset
Assembling the data
Training the model
Generating the caption

Downloading the dataset

In this step, we will download the COCO dataset that we will use to train our model.

COCO dataset

The COCO dataset is a large-scale object detection, segmentation, and captioning dataset (https://cocodataset.org). It has 1.5 million object instances, 80 object categories, and 5 captions per image. You can explore the dataset at https://cocodataset.org/#explore by filtering on one or more object types, such as the images of dogs shown in the following screenshot. Each image has tiles above it to show/hide URLs, segmentations, and captions:

Figure 7.4 – COCO dataset

Here are a few more images from the dataset:

Figure 7.5 – Random dataset examples from the COCO website home page

Extracting the dataset

...

Summary

We have seen in this chapter how PyTorch Lightning can be used to create semi-supervised learning models easily with a lot of out-of-the-box capabilities. We have seen an example of how to use machines to generate the captions for images as if they were written by humans. We have also seen an implementation of code for an advanced neural network architecture that combines the CNN and RNN architectures.

Creating art using machine learning algorithms opens new possibilities for what can be done in this field. What we have done in this project is a modest wrapper around recently developed algorithms in this field, extending them to different areas. One challenge in generated text that often comes up is a contextual accuracy parameter, which measures the accuracy of created lyrics based on the question, does it make sense to humans? The proposal of some sort of technical criterion to be used to measure the accuracy of such models in this regard is a very important area of research...