Reader small image

You're reading from  Modern Computer Vision with PyTorch

Product typeBook
Published inNov 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839213472
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
V Kishore Ayyadevara
V Kishore Ayyadevara
author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy
Yeshwanth Reddy
author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

View More author details
Right arrow
Image Generation Using GANs

In the previous chapter, we learned about manipulating an image using neural style transfer and super-imposed the expression in one image on another. However, what if we give the network a bunch of images and ask it to come up with an entirely new image, all on its own?

Generative Adversarial Network (GAN) is a step toward achieving the feat of generating an image given a collection of images. In this chapter, we will start by learning about the idea behind what makes GANs work, before building one from scratch. GANs are a vast field that is expanding as we write this book. This chapter will lay the foundation of GANs through three variants of GANs, while we will learn about more advanced GANs and their applications in the next chapter.

In this chapter, we will explore the following topics:

  • Introducing GANs
  • Using GANs to generate handwritten digits...

Introducing GANs

To understand GANs, we need to understand two terms: generator and discriminator. First, we should have a reasonable sample of images of an object. A generative network (generator) learns representation from a sample of images and then generates images similar to the sample of images. A discriminator network (discriminator) is one that looks at the image generated (by the generator network) and the original sample of images and classifies images as original ones or generated (fake) ones.

The generator network generates images in such a way that the discriminator classifies the images as real ones. The discriminator network classifies the generated images as fake and the images in the original sample as real.

Essentially, the adversarial term in GAN represents the opposite nature of the two networks—a generator network, which generates images to fool the discriminator network, and a discriminator network that classifies each image by saying whether the image is...

Using GANs to generate handwritten digits

To generate images of handwritten digits, we will leverage the same network as we learned about in the previous section. The strategy we will adopt is as follows:

  1. Import MNIST data.
  2. Initialize random noise.
  3. Define the generator model.
  4. Define the discriminator model.
  5. Train the two models alternately.
  6. Let the model train until the generator and discriminator losses are largely the same.

Let's execute each of the preceding steps in the following code:

The following code is available as Handwritten_digit_generation_using_GAN.ipynb in the Chapter12 folder in this book's GitHub repository - https://tinyurl.com/mcvp-packt The code is moderately lengthy. We strongly recommend you to execute the notebook in GitHub to reproduce results while you understand the steps to perform and explanation of various code components from text.
  1. Import the relevant packages and define the device:
!pip install -q torch_snippets
from torch_snippets import *
device...

Using DCGANs to generate face images

In the previous section, we learned about generating images using GANs. However, we have already seen that Convolutional Neural Networks (CNNs) perform better in the context of images when compared to vanilla neural networks. In this section, we will learn about generating images using Deep Convolutional Generative Adversarial Networks (DCGANs), which use convolution and pooling operations in the model.

First, let's understand the technique we will leverage to generate an image using a set of 100 random numbers. We will first convert noise into a shape of batch size x 100 x 1 x 1. The reason for appending additional channel information in DCGANs and not doing it in the GAN section is that we will leverage CNNs in this section, which requires inputs in the form of batch size x channels x height x width.

Next, we convert the generated noise into an image by leveraging
ConvTranspose2d.

As we learned in Chapter 9, Image Segmentation, ConvTranspose2d...

Implementing conditional GANs

Imagine a scenario where we want to generate an image of a class of our interest; for example, an image of a cat or an image of a dog, or an image of a man with spectacles. How do we specify that we want to generate an image of interest to us? Conditional GANs come to the rescue in this scenario.

For now, let's assume that we have the images of male and female faces only along with their corresponding labels. In this section, we will learn about generating images of a specified class of interest from random noise.

The strategy we adopt is as follows:

  1. Specify the label of the image we want to generate as a one-hot-encoded version.
  2. Pass the label through an embedding layer to generate a multi-dimensional representation of each class.
  3. Generate random noise and concatenate with the embedding layer generated in the previous step.
  4. Train the model just like we did in the previous sections, but this time with the noise vector concatenated with the embedding...

Summary

In this chapter, we have learned about leveraging two different neural networks to generate new images of handwritten digits using GANs. Next, we generated realistic faces using DCGANs. Finally, we learned about conditional GANs, which help us in generating images of a certain class. Having generated images using different techniques, we could still see that the generated images were not sufficiently realistic. Furthermore, while we generated images by specifying the class of images we want to generate in conditional GANs, we are still not in a position to perform image translation, where we ask to replace one object in the image with another one, with everything else left as is. In addition, we are yet to have an image generation mechanism where the number of classes (styles) to generate is more unsupervised.

In the next chapter, we will learn about generating images that are more realistic using some of the latest variants of GANs. In addition, we will learn about generating...

Questions

  1. What happens if the learning rate of generator and discriminator models is high?
  2. In a scenario where the generator and discriminator are very well trained, what is the probability of a given image being real?
  3. Why do we use convtranspose2d in generating images?
  4. Why do we have embeddings with a high embedding size compared with the number of classes in conditional GANs?
  5. How can we generate images of men who have a beard?
  6. Why do we have Tanh activation in the last layer in the generator and not ReLU or Sigmoid?
  7. Why did we get realistic images even though we did not denormalize the generated data?
  1. What happens if we do not crop faces corresponding to images prior to training the GAN?
  2. Why do the weights of the discriminator not get updated when training the generator (as the generator_train_step function involves the discriminator network)?
  3. Why do we fetch losses on both real and fake images while training the discriminator, but only the loss on fake images while training the...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Computer Vision with PyTorch
Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy