Reader small image

You're reading from  Deep Learning with PyTorch Lightning

Product typeBook
Published inApr 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800561618
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Kunal Sawarkar
Kunal Sawarkar
author image
Kunal Sawarkar

Kunal Sawarkar is a chief data scientist and AI thought leader. He leads the worldwide partner ecosystem in building innovative AI products. He also serves as an advisory board member and an angel investor. He holds a master's degree from Harvard University with major coursework in applied statistics. He has been applying machine learning to solve previously unsolved problems in industry and society, with a special focus on deep learning and self-supervised learning. Kunal has led various AI product R&D labs and has 20+ patents and papers published in this field. When not diving into data, he loves doing rock climbing and learning to fly aircraft, in addition to an insatiable curiosity for astronomy and wildlife.
Read more about Kunal Sawarkar

Right arrow

Chapter 6: Deep Generative Models

It has always been the dream of mankind to build a machine that can match human ingenuity. While the word intelligence comes with various dimensions, such as calculations, recognition of objects, speech, understanding context, and reasoning; no aspects of human intelligence make us more human than our creativity. The ability to create a piece of art, be it a piece of music, a poem, a painting, or a movie, has always been the epitome of human intelligence, and people who are good at such creativity are often treated as "geniuses." The question that remains fully unanswered is, can a machine learn creativity?

We have seen machines learn to predict images using a variety of information and sometimes even with little information. A machine learning model can learn from a set of training images and labels to recognize various objects in an image; however, the success of vision models depends on their capability for vast generalizations –...

Technical requirements

In this chapter, we will primarily be using the following Python modules, mentioned with their versions:

  • pytorch lightning (version 1.5.2)
  • torch (version 1.10.0)
  • matplotlib (version 3.2.2)

Working examples for this chapter can be found at this GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning/tree/main/Chapter06.

In order to make sure that these modules work together and not go out of sync, we have used the specific version of torch, torchvision, torchtext, torchaudio with PyTorch Lightning 1.5.2. You can also use the latest version of PyTorch Lightning and torch compatible with each other. More details can be found on the GitHub link: https://github.com/PacktPublishing/Deep-Learning-with-PyTorch-Lightning

!pip install torch==1.10.0 torchvision==0.11.1 torchtext==0.11.0 torchaudio==0.10.0 --quiet
!pip install pytorch-lightning==1.5.2 --quiet

We will be using the Food dataset, which contains a...

Getting started with GAN models

One of the most amazing applications of GANs is generation. Just look at the following picture of a girl; can you guess whether she is real or simply generated by a machine?

Figure 6.1 – Fake face generation using StyleGAN (image credit – https://thispersondoesnotexist.com)

Creating such incredibly realistic faces is one of the most successful use cases of GANs. However, GANs are not limited to just generating pretty faces or deepfake videos; they also have key commercial applications as well, such as generating images of houses or creating new models of cars or paintings.

While generative models have been used in the past in statistics, deep generative models such as GANs are relatively new. Deep generative models also include Variational Autoencoders (VAEs) and auto-regressive models. However, with GAN being the most popular method, we will focus on them here.

What is a GAN?

Interestingly, GAN originated...

Creating new food items using a GAN

GANs are one of the most common and powerful algorithms used in generative modeling. GANs are used widely to generate fake faces, pictures, anime/cartoon characters, image style translations, semantic image translation, and so on.

We will start by creating an architecture for our GAN model:

Figure 6.3 – GAN architecture for creating a new food

Firstly, we will define the neural networks for the generator and the discriminator with multiple layers of convolution and fully connected layers. In the architecture that we will be building, we will have four convolutional and one fully connected layer for the discriminator, and we will be utilizing five transposed convolution layers for the generator. We will attempt to generate fake images by adding Gaussian noise and use the discriminator to detect these fake images. Then, we will use the Adam optimizer to optimize the neural network. For this use, we will use cross...

Creating new butterfly species using a GAN

In this section, we are going to use the same GAN model that we built in the previous section with a minor tweak to generate new species of butterflies.

Since we are following the same steps here, we will keep the description concise and observe the outputs. (The full code can be found in the GitHub repository for this chapter.)

We will first try with the previous architecture that we used for generating food images (which is 4 convolution, 1 fully connected layer, and 5 transposed convolution layers). We will then try another architecture with 5 convolution layers and 5 transposed convolution layers:

  1. Download the dataset:
    dataset_url =  'https://www.kaggle.com/gpiosenka/butterfly-images40-species'
    od.download(dataset_url)
  2. Initialize the variables for the images:
    image_size = 64
    batch_size = 128
    normalize = [(0.5, 0.5, 0.5), (0.5, 0.5, 0.5)]
    latent_size = 256
    butterfly_data_directory = "/content/butterfly...

GAN training challenges

A GAN model requires a lot of compute resources for training a model in order to get a good result, especially when a dataset is not very clean and representations in an image are not very easy to learn. In order to get a very clean output with sharp representations in our fake generated image, we need to pass a higher resolution image as input to our GAN model. However, the higher resolution means a lot more parameters are needed in the model, which in turn requires much more memory to train the model.

Here is an example scenario. We have trained our models using the image size of 64 pixels, but if we increase the image size to 128 pixels, then the number of parameters in the GAN model increases drastically from 15.9 M to 93.4 M. This, in turn, requires much more compute power to train the model, and with the limited resources in the Google Collab environment, you might get an error similar to this after 20–25 epochs:

RuntimeError: CUDA out of...

Creating images using DCGAN

A DCGAN is a direct extension of the GAN model discussed in the previous section, except that it explicitly uses the convolutional and convolutional-transpose layers in the discriminator and generator respectively. DCGAN was first proposed in a paper, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, by Alec Radford, Luke Metz, and Soumith Chintala:

Figure 6.13 – A DCGAN architecture overview

The DCGAN architecture basically consists of 5 layers of convolution and 5 layers for transposed convolution. There is no fully connected layer in this architecture. We will also use a learning rate of 0.0002 for training the model.

We can also take a more in-depth look at the generator architecture of the DCGAN to see how it works:

Figure 6.14 – The DCGAN generator architecture from the paper

It can be observed from the DCGAN generator architecture diagram...

Summary

GAN is a powerful method for generating not only images but also paintings, and even 3D objects (using newer variants of a GAN). We saw how, using a combination of discriminator and generator networks (each with five convolutional layers), we can start with random noise and generate an image that mimics real images. The play-off between the generator and discriminator keeps producing better images by minimizing the loss function and going through multiple iterations. The end result is fake pictures that never existed in real life.

It's a powerful method, and there are concerns about its ethical use. Fake images and objects can be used to defraud people; however, it also creates endless new opportunities. For example, imagine looking at a picture of fashion models while shopping for a new outfit. Instead of relying on endless image shoots, using a GAN (and DCGAN), you can generate realistic pictures of models with all body types, sizes, shapes, and colors, helping both...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with PyTorch Lightning
Published in: Apr 2022Publisher: PacktISBN-13: 9781800561618
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kunal Sawarkar

Kunal Sawarkar is a chief data scientist and AI thought leader. He leads the worldwide partner ecosystem in building innovative AI products. He also serves as an advisory board member and an angel investor. He holds a master's degree from Harvard University with major coursework in applied statistics. He has been applying machine learning to solve previously unsolved problems in industry and society, with a special focus on deep learning and self-supervised learning. Kunal has led various AI product R&D labs and has 20+ patents and papers published in this field. When not diving into data, he loves doing rock climbing and learning to fly aircraft, in addition to an insatiable curiosity for astronomy and wildlife.
Read more about Kunal Sawarkar