You're reading from Modern Computer Vision with PyTorch

Product type Book

Published in Nov 2020

Publisher Packt

ISBN-13 9781839213472

Pages 824 pages

Edition 1st Edition

Languages

Python

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Advanced GANs to Manipulate Images

In the previous chapter, we learned about leveraging Generative Adversarial Networks (GANs) to generate realistic images. In this chapter, we will learn about leveraging GANs to manipulate images. We will learn about two variations of generating images using GANs – supervised and unsupervised methods. In the supervised method, we will provide the input and output pair combinations to generate images based on an input image, which we will learn about in the Pix2Pix GAN. In the unsupervised method, we will specify the input and output, however, we will not provide one-to-one correspondence between the input and output, but expect the GAN to learn the structure of the two classes, and convert an image from one class to another, which we will learn about in CycleGAN.

Another class of unsupervised image manipulation involves generating images...

Leveraging the Pix2Pix GAN

Imagine a scenario where we have pairs of images that are related to each other (for example, an image of edges of an object as input and an actual image of the object as output). The challenge given is that we want to generate an image given the input image of the edges of an object. In a traditional setting, this would have been a simple mapping of input to output and hence a supervised learning problem. However, imagine that you are working with a creative team that is trying to come up with a fresh look for products. In such a scenario, supervised learning does not help as much – as it learns only from history. A GAN comes in handy here because it will ensure that the generated image looks realistic enough and leaves room for experimentation (as we are interested in checking whether the generated image seems like one of the classes of interest or not).

In this section, we will learn about the architecture to generate the image of a shoe from a hand...

Leveraging CycleGAN

Imagine a scenario where we ask you to perform image translation from one class to another, but not give the input and the corresponding output images to train the model. However, we give you the images of both classes in two distinct folders. CycleGAN comes in handy in such a scenario.

In this section, we will learn how to train CycleGAN to convert the image of an apple into the image of an orange and vice versa. The Cycle in CycleGAN refers to the fact that we are translating (converting) an image from one class to another and back to the original class.

At a high level, we will have three separate loss values in this architecture (more detail is provided here):

Discriminator loss: This ensures that the object class is modified while training the model (as seen in the previous section).
Cycle loss: The loss of recycling an image from the generated image to the original to ensure that the surrounding pixels are not changed.
Identity loss: The loss when an image of...

Leveraging StyleGAN on custom images

Let's first understand a few historical developments prior to the invention of StyleGAN. As we know, generating fake faces from the previous chapter involved the usage of GANs. The biggest problem that research faced was that the images that could be generated were small (typically 64 x 64). And any effort to generate images of a larger size caused the generators or discriminators to fall into local minima that would stop training and generate gibberish. One of the major leaps in generating high-quality images involved a research paper called ProGAN (short for Progressive GAN), which involved a clever trick.

The size of both the generator and discriminator is progressively increased. In the first step, you create a generator and discriminator to generate 4 x 4 images from a latent vector. After this, additional convolution (and upscaling) layers are added to the trained generator and discriminator, which will be responsible for accepting the...

Super-resolution GAN

In the previous section, we saw a scenario where we leveraged the pre-trained StyleGAN to generate images in a given style. In this section, we will take it a step further and learn about leveraging pre-trained models to perform image super-resolution. We will gain an understanding of the architecture of the Super-resolution GAN model before implementing it on images.

First, we will understand the reason why a GAN is a good solution for the task of super-resolution. Imagine a scenario where you are given an image and asked to increase its resolution. Intuitively, you would consider various interpolation techniques to perform super-resolution. Here's a sample low-resolution image along with the outputs of various techniques (image source: https://arxiv.org/pdf/1609.04802.pdf):

From the preceding image, we can see that traditional interpolation techniques such as bicubic interpolation do not help as much when reconstructing the image from a low resolution (a 4X...

Summary

In this chapter, we have learned about generating images from a given contour using the Pix2Pix GAN. Further, we learned about the various loss functions in CycleGAN to convert images of one class to another. Next, we learned about how StyleGAN helps in generating realistic faces and also copying the style from one image to another based on the way in which the generator is trained. Finally, we learned about leveraging the pre-trained SRGAN model to generate high-resolution images.

In the next chapter, we will switch to learning about training an image classification model based on very few (typically less than 20) images.

Questions

Why do we need the Pix2Pix GAN where a supervised learning algorithm such as UNet could have worked to generate images from contours?
Why do we need to optimize for three different loss functions in CycleGAN?
How do the tricks leveraged in ProgressiveGAN help in building StyleGAN?
How do we identify latent vectors that correspond to a given custom image?