Reader small image

You're reading from  Deep Learning for Computer Vision

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788295628
Edition1st Edition
Languages
Right arrow
Author (1)
Rajalingappaa Shanmugamani
Rajalingappaa Shanmugamani
author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani

Right arrow

Chapter 8. Generative Models

Generative models have become an important application in computer vision. Unlike the applications discussed in previous chapters that made predictions from images, generative models can create an image for specific objectives. In this chapter, we will understand:

  • The applications of generative models
  • Algorithms for style transfer
  • Training a model for super-resolution of images
  • Implementation and training of generative models
  • Drawbacks of current models 

By the end of the chapter, you will be able to implement some great applications for transferring style and understand the possibilities, as well as difficulties, associated with generative models.

Applications of generative models


Let's start this chapter with the possible applications of generative models. The applications are enormous. We will see a few of these applications to understand the motivation and possibilities.

Artistic style transfer

Artistic style transfer is the process of transferring the style of art to any image. For example, an image can be created with the artistic style of an image and content of another image. An example of one image combined with several different styles is shown here illustrated by Gatys et al.  (https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf). The image A is the photo on which the style is applied, and the results are shown in other images:

Reproduced from Gatys et al.

This application has caught the public's attention, and there are several mobile apps in the market providing this facility. 

Predicting the next frame in a video 

Predicting future frames from synthetic video sets...

Neural artistic style transfer


The first application we will implement is the neural artistic style transfer. Here, we will transfer the style of Van Gogh art onto an image. An image can be considered as a combination of style and content. The artistic style transfer technique transforms an image to look like a painting with a specific painting style. We will see how to code this idea up. The loss function will compare the generated image with the content of the photo and style of the painting. Hence, the optimization is carried out for the image pixel, rather than for the weights of the network. Two values are calculated by comparing the content of the photo with the generated image followed by the style of the painting and the generated image.

Content loss

Since pixels are not a good choice, we will use the CNN features of various layers, as they are a better representation of the content. The initial layers as seen in Chapter 3, Image Retrieval, have high-frequency such as edges, corners...

Generative Adversarial Networks


Generative Adversarial Networks (GAN) were invented by Ian Goodfellow, in 2014. It is an unsupervised algorithm where two neural networks are trained as a discriminator and a generator, simultaneously. The technique can generate an image from random noise and a discriminator can evaluate whether is an original image. After further training, the generator network can generate photo-realistic images. The generator network is typically a deconvolutional neural network and the discriminator is a convolution neural network. 

An excellent analogy to understand this is to think of the generator as someone who prints fake money and the discriminator as a police officer who determines whether the money is fake or not. The generator keeps improving the quality of the fake money based on the feedback from the police till the police can't differentiate between the original and fake money. Now, let's start with the implementation. 

Vanilla GAN

The original GAN is called a...

Visual dialogue model


The visual dialogue model (VDM) enables chat based on images. VDM applies technologies from computer vision, Natural Language Processing (NLP) and chatbots. It has found major applications such as explaining to blind people about images, to doctors about medical scans, virtual companions and so on. Next, we will see the algorithm to solve this challenge. 

Algorithm for VDM

The algorithm discussed here is proposed by Lu et al (https://research.fb.com/wp-content/uploads/2017/11/camera_ready_nips2017.pdf). Lu et al proposed a GAN-based VDM. The generator generates answers and the discriminator ranks those answers. The following is a schematic representation of the process:

Architecture of the VDMs based on GAN techniques [Reproduced from Lu et al.]

The history of chat, the current question and image are fed as an input to the generator. Next, we will see how the generator works. 

Generator

The generator has an encoder and decoder. The encoder takes an image, question, and history...

Summary


In this chapter, we have learned about generative models and the vast number of applications. We implemented them to transfer style from one to another while preserving the content. We saw the intuition behind GAN and trained models to do the same. In the end, we learned about the visual dialogue system.

In the next chapter, we will learn about deep learning methods for video analysis. We will see how to access video content through cameras, files, and so on. We will implement video classification by applying classification on a frame level and on the video as a whole. Later, we will see how to track objects in a video.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning for Computer Vision
Published in: Jan 2018Publisher: PacktISBN-13: 9781788295628
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani