Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Modern Computer Vision with PyTorch

You're reading from  Modern Computer Vision with PyTorch

Product type Book
Published in Nov 2020
Publisher Packt
ISBN-13 9781839213472
Pages 824 pages
Edition 1st Edition
Languages
Authors (2):
V Kishore Ayyadevara V Kishore Ayyadevara
Profile icon V Kishore Ayyadevara
Yeshwanth Reddy Yeshwanth Reddy
Profile icon Yeshwanth Reddy
View More author details

Table of Contents (25) Chapters

Preface Section 1 - Fundamentals of Deep Learning for Computer Vision
Artificial Neural Network Fundamentals PyTorch Fundamentals Building a Deep Neural Network with PyTorch Section 2 - Object Classification and Detection
Introducing Convolutional Neural Networks Transfer Learning for Image Classification Practical Aspects of Image Classification Basics of Object Detection Advanced Object Detection Image Segmentation Applications of Object Detection and Segmentation Section 3 - Image Manipulation
Autoencoders and Image Manipulation Image Generation Using GANs Advanced GANs to Manipulate Images Section 4 - Combining Computer Vision with Other Techniques
Training with Minimal Data Points Combining Computer Vision and NLP Techniques Combining Computer Vision and Reinforcement Learning Moving a Model to Production Using OpenCV Utilities for Image Analysis Other Books You May Enjoy Appendix
Advanced GANs to Manipulate Images

In the previous chapter, we learned about leveraging Generative Adversarial Networks (GANs) to generate realistic images. In this chapter, we will learn about leveraging GANs to manipulate images. We will learn about two variations of generating images using GANs – supervised and unsupervised methods. In the supervised method, we will provide the input and output pair combinations to generate images based on an input image, which we will learn about in the Pix2Pix GAN. In the unsupervised method, we will specify the input and output, however, we will not provide one-to-one correspondence between the input and output, but expect the GAN to learn the structure of the two classes, and convert an image from one class to another, which we will learn about in CycleGAN.

Another class of unsupervised image manipulation involves generating images...

Leveraging the Pix2Pix GAN

Imagine a scenario where we have pairs of images that are related to each other (for example, an image of edges of an object as input and an actual image of the object as output). The challenge given is that we want to generate an image given the input image of the edges of an object. In a traditional setting, this would have been a simple mapping of input to output and hence a supervised learning problem. However, imagine that you are working with a creative team that is trying to come up with a fresh look for products. In such a scenario, supervised learning does not help as much – as it learns only from history. A GAN comes in handy here because it will ensure that the generated image looks realistic enough and leaves room for experimentation (as we are interested in checking whether the generated image seems like one of the classes of interest or not).

In this section, we will learn about the architecture to generate the image of a shoe from a hand...

Leveraging CycleGAN

Imagine a scenario where we ask you to perform image translation from one class to another, but not give the input and the corresponding output images to train the model. However, we give you the images of both classes in two distinct folders. CycleGAN comes in handy in such a scenario.

In this section, we will learn how to train CycleGAN to convert the image of an apple into the image of an orange and vice versa. The Cycle in CycleGAN refers to the fact that we are translating (converting) an image from one class to another and back to the original class.

At a high level, we will have three separate loss values in this architecture (more detail is provided here):

  • Discriminator loss: This ensures that the object class is modified while training the model (as seen in the previous section).
  • Cycle loss: The loss of recycling an image from the generated image to the original to ensure that the surrounding pixels are not changed.
  • Identity loss: The loss when an image of...

Leveraging StyleGAN on custom images

Let's first understand a few historical developments prior to the invention of StyleGAN. As we know, generating fake faces from the previous chapter involved the usage of GANs. The biggest problem that research faced was that the images that could be generated were small (typically 64 x 64). And any effort to generate images of a larger size caused the generators or discriminators to fall into local minima that would stop training and generate gibberish. One of the major leaps in generating high-quality images involved a research paper called ProGAN (short for Progressive GAN), which involved a clever trick.

The size of both the generator and discriminator is progressively increased. In the first step, you create a generator and discriminator to generate 4 x 4 images from a latent vector. After this, additional convolution (and upscaling) layers are added to the trained generator and discriminator, which will be responsible for accepting the...

Super-resolution GAN

In the previous section, we saw a scenario where we leveraged the pre-trained StyleGAN to generate images in a given style. In this section, we will take it a step further and learn about leveraging pre-trained models to perform image super-resolution. We will gain an understanding of the architecture of the Super-resolution GAN model before implementing it on images.

First, we will understand the reason why a GAN is a good solution for the task of super-resolution. Imagine a scenario where you are given an image and asked to increase its resolution. Intuitively, you would consider various interpolation techniques to perform super-resolution. Here's a sample low-resolution image along with the outputs of various techniques (image source: https://arxiv.org/pdf/1609.04802.pdf):

From the preceding image, we can see that traditional interpolation techniques such as bicubic interpolation do not help as much when reconstructing the image from a low resolution (a 4X...

Summary

In this chapter, we have learned about generating images from a given contour using the Pix2Pix GAN. Further, we learned about the various loss functions in CycleGAN to convert images of one class to another. Next, we learned about how StyleGAN helps in generating realistic faces and also copying the style from one image to another based on the way in which the generator is trained. Finally, we learned about leveraging the pre-trained SRGAN model to generate high-resolution images.

In the next chapter, we will switch to learning about training an image classification model based on very few (typically less than 20) images.

Questions

  1. Why do we need the Pix2Pix GAN where a supervised learning algorithm such as UNet could have worked to generate images from contours?
  2. Why do we need to optimize for three different loss functions in CycleGAN?
  3. How do the tricks leveraged in ProgressiveGAN help in building StyleGAN?
  4. How do we identify latent vectors that correspond to a given custom image?
lock icon The rest of the chapter is locked
You have been reading a chapter from
Modern Computer Vision with PyTorch
Published in: Nov 2020 Publisher: Packt ISBN-13: 9781839213472
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}