Reader small image

You're reading from  Generative Adversarial Networks Projects

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789136678
Edition1st Edition
Languages
Right arrow
Author (1)
Kailash Ahirwar
Kailash Ahirwar
author image
Kailash Ahirwar

Kailash Ahirwar is a machine learning and deep learning enthusiast. He has worked in many areas of Artificial Intelligence (AI), ranging from natural language processing and computer vision to generative modeling using GANs. He is a co-founder and CTO of Mate Labs. He uses GANs to build different models, such as turning paintings into photos and controlling deep image synthesis with texture patches. He is super optimistic about AGI and believes that AI is going to be the workhorse of human evolution.
Read more about Kailash Ahirwar

Right arrow

StackGAN - Text to Photo-Realistic Image Synthesis

Text to image synthesis is one of the use cases for Generative Adversarial Networks (GANs) that has many industrial applications, just like the GANs described in previous chapters. Synthesizing images from text descriptions is very hard, as it is very difficult to build a model that can generate images that reflect the meaning of the text. One network that tries to solve this problem is StackGAN. In this chapter, we will implement a StackGAN in the Keras framework, using TensorFlow as the backend.

In this chapter, we will cover the following topics:

  • Introduction to StackGAN
  • The architecture of StackGAN
  • Data gathering and preparation
  • A Keras implementation of StackGAN
  • Training a StackGAN
  • Evaluating the model
  • Practical applications of a pix2pix network

Introduction to StackGAN

A StackGAN is named as such because it has two GANs that are stacked together to form a network that is capable of generating high-resolution images. It has two stages, Stage-I and Stage-II. The Stage-I network generates low-resolution images with basic colors and rough sketches, conditioned on a text embedding, while the Stage-II network takes the image generated by the Stage-I network and generates a high-resolution image that is conditioned on a text embedding. Basically, the second network corrects defects and adds compelling details, yielding a more realistic high-resolution image.

We can compare a StackGAN network to the work of a painter. As a painter starts working, they draw primitive shapes such as lines, circles, and rectangles. Then, they try to fill in the colors. As the painting progresses, more and more detail is added. In a StackGAN, Stage...

Architecture of StackGAN

StackGAN is a two-stage network. Each stage has two generators and two discriminators. StackGAN is made up of many networks, which are as follows:

  • Stack-I GAN: text encoder, Conditioning Augmentation network, generator network, discriminator network, embedding compressor network
  • Stack-II GAN: text encoder, Conditioning Augmentation network, generator network, discriminator network, embedding compressor network
Source: arXiv:1612.03242 [cs.CV]

The preceding image is self-explanatory. It represents both stages of the StackGAN network. As you can see, the first stage is generating images with dimensions of 64x64. Then the second stage takes these low-resolution images and generates high-resolution images with dimensions of 256x256. In the next few sections, we will explore the different components in the StackGAN network. Before doing this, however, let...

Setting up the project

If you haven't already cloned the repository with the complete code for all chapters, clone the repository now. The downloaded code has a directory called Chapter06, which contains the entire code for this chapter. Execute the following commands to set up the project:

  1. Start by navigating to the parent directory as follows:
cd Generative-Adversarial-Networks-Projects
  1. Now, change the directory from the current directory to Chapter06:
cd Chapter06
  1. Next, create a Python virtual environment for this project:
virtualenv venv
virtualenv venv -p python3 # Create a virtual environment using
python3 interpreter
virtualenv venv -p python2 # Create a virtual environment using
python2 interpreter

We will be using this newly created virtual environment for this project. Each chapter has its own separate virtual environment.

  1. Activate the...

Data preparation

In this section, we will be working with the CUB dataset, which is an image dataset of different bird species and can found at the following link: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html. The CUB dataset contains 11,788 high-resolution images. We will also need the char-CNN-RNN text embeddings, which can be found at the following link: https://drive.google.com/open?id=0B3y_msrWZaXLT1BZdVdycDY5TEE. These are pretrained text embeddings. Follow the instructions given in the next few sections to download and extract the dataset.

Downloading the dataset

A Keras implementation of StackGAN

The Keras implementation of StackGAN is divided into two parts: Stage-I and Stage-II. We will implement these stages in the following sections.

Stage-I

A Stage-I StackGAN contains both a generator network and a discriminator network. It also has a text encoder network and a Conditional Augmentation network (CA network), which are explained in detail in the following section. The generator network gets the text conditioning variable (), along with a noise vector (x). After a set of upsampling layers, it produces a low-resolution image with dimensions of 64x64x3. The discriminator network takes this low-resolution image and tries to identify whether the image is real or fake. The generator...

Training a StackGAN

In this section, we will learn how to train both the StackGANs. In the first subsection, we will train the Stage-I StackGAN. In the second subsection, we will train the Stage-II StackGAN.

Training the Stage-I StackGAN

Before starting the training, we need to specify the essential hyperparameters. Hyperparameters are values that don't change during the training. Let's do this first:

data_dir = "Specify your dataset directory here/Data/birds"
train_dir = data_dir + "/train"
test_dir = data_dir + "/test"
image_size = 64
batch_size = 64
z_dim = 100
stage1_generator_lr = 0.0002
stage1_discriminator_lr = 0.0002
stage1_lr_decay_step = 600
epochs = 1000
condition_dim = 128

embeddings_file_path_train...

Practical applications of StackGAN

The industry applications of a StackGAN include the following:

  • Generating high-resolution images automatically for entertainment purposes or educational purposes
  • Creating comics: With the use of a StackGAN, the process of creating comics can be reduced to days as the StackGAN can generate comics automatically and assist in the creative process
  • Movie creation: A StackGAN can assist a movie creator by generating frames from text descriptions
  • Art creation: A StackGAN can assist an artist by generating sketches from text descriptions

Summary

In this chapter, we have learned about and implemented a StackGAN network to generate high-resolution images from text descriptions. We started with a basic introduction to StackGAN, in which we explored the architectural details of a StackGAN and discovered the losses used for the training of StackGAN. Then, we downloaded and prepared the dataset. After that, we started implementing the StackGAN in the Keras framework. After the implementation, we trained the Stage-I and Stage-II StackGANS sequentially. After successfully training the network, we evaluated the model and saved it for further use.

In the next chapter, we will work with CycleGAN, a network that can convert paintings into photos.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Generative Adversarial Networks Projects
Published in: Jan 2019Publisher: PacktISBN-13: 9781789136678
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Kailash Ahirwar

Kailash Ahirwar is a machine learning and deep learning enthusiast. He has worked in many areas of Artificial Intelligence (AI), ranging from natural language processing and computer vision to generative modeling using GANs. He is a co-founder and CTO of Mate Labs. He uses GANs to build different models, such as turning paintings into photos and controlling deep image synthesis with texture patches. He is super optimistic about AGI and believes that AI is going to be the workhorse of human evolution.
Read more about Kailash Ahirwar