You're reading from Hands-On Image Generation with TensorFlow

Product type Book

Published in Dec 2020

Publisher Packt

ISBN-13 9781838826789

Pages 306 pages

Edition 1st Edition

Languages

Python

Concepts

Computer Vision

Author (1):

Soon Yau Cheong

Table of Contents (15) Chapters

Preface

Section 1: Fundamentals of Image Generation with TensorFlow

Chapter 1: Getting Started with Image Generation Using TensorFlow

Chapter 2: Variational Autoencoder

Chapter 3: Generative Adversarial Network

Section 2: Applications of Deep Generative Models

Chapter 4: Image-to-Image Translation

Chapter 5: Style Transfer

Chapter 6: AI Painter

Section 3: Advanced Deep Generative Techniques

Chapter 7: High Fidelity Face Generation

Chapter 8: Self-Attention for Image Generation

Chapter 9: Video Synthesis

Chapter 10: Road Ahead

Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 2: Variational Autoencoder

In the previous chapter, we looked at how a computer sees an image as pixels, and we devised a probabilistic model for pixel distribution for image generation. However, this is not the most efficient way to generate an image. Instead of scanning an image pixel by pixel, we first look at the image and try to understand what is inside. For example, a girl is sitting, wearing a hat, and smiling. Then we use that information to draw a portrait. This is how autoencoders work.

In this chapter, we will first learn how to use an autoencoder to encode pixels into latent variables that we can sample from to generate images. Then we will learn how to tweak it to create a more powerful model known as a variational autoencoder (VAE). Finally, we will train our VAE to generate faces and perform face editing. The following topics will be covered in this chapter:

Learning latent variables with autoencoders
Variational autoencoders
Generating faces...

Technical requirements

The Jupyter notebooks and codes can be found at https://github.com/PacktPublishing/Hands-On-Image-Generation-with-TensorFlow-2.0/tree/master/Chapter02.

The notebooks used in this chapter are as follows:

ch2_autoencoder.ipynb
ch2_vae_mnist.ipynb
ch2_vae_faces.ipynb

Learning latent variables with autoencoders

Autoencoders were first introduced in the 1980s, and one of the inventors is Geoffrey Hinton, who is one of the godfathers of modern deep learning. The hypothesis is that there are many redundancies in high-dimensional input space that can be compressed into some low-dimensional variables. There are traditional machine learning techniques such as Principal Component Analysis (PCA) for dimension reduction.

However, in image generation, we will also want to restore the low dimension space into high dimension space. Although the way to do it is quite different, you can think of it like image compression where a raw image is compressed into a file format such as JPEG, which is small and easy to store and transfer. Then the computer can restore the JPEG into pixels that we can see and manipulate. In other words, the raw pixels are compressed into low-dimensional JPEG format and restored to high-dimensional raw pixels for display.

Autoencoders...

Variational autoencoders

In an autoencoder, the decoder samples directly from latent variables. Variational autoencoders (VAEs), which were invented in 2014, differ in that the sampling is taken from a distribution parameterized by the latent variables. To be clear, let's say we have an autoencoder with two latent variables, and we draw samples randomly and get two samples of 0.4 and 1.2. We then send them to the decoder to generate an image.

In a VAE, these samples don't go to the decoder directly. Instead, they are used as a mean and variance of a Gaussian distribution, and we draw samples from this distribution to be sent to the decoder for image generation. As this is one of the most important distributions in machine learning, so let's go over some basics of Gaussian distributions before creating a VAE.

Gaussian distribution

A Gaussian distribution is characterized by two parameters – mean and variance. I think we are all familiar with the different...

Generating faces with VAEs

Now that you have learned the theory of VAEs and have built one for MNIST, it is time to grow up, ditch the toy, and generate some serious stuff. We will use VAE to generate some faces. Let's get started! The code is in ch2_vae_faces.ipynb. There are a few face datasets available for training:

Celeb A (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). This is a popular dataset in academia as it contains annotations of face attributes, but unfortunately it is not available for commercial use.
Flickr-Faces-HQ Dataset (FFHQ) (https://github.com/NVlabs/ffhq-dataset). This dataset is freely available for commercial use and contains high-resolution images.

In this exercise, we will only assume the dataset contains RGB images; feel free to use any dataset that suits your needs.

Network architecture

We reuse the MNIST VAE and training pipeline with some modifications given that the dataset is now different from MNIST. Feel free to reduce...

Controlling face attributes

Everything we have done in this chapter serves only one purpose: to prepare us for face editing! This is the climax of this chapter!

Latent space arithmetic

We have talked about the latent space several times now but haven't given it a proper definition. Essentially, it means every possible value of the latent variables. In our VAE, it is a vector of 200 dimensions, or simply 200 variables. As much as we hope each variable has a distinctive semantic meaning to us, such as z[0] is for eyes, z[1] dictates the eye color, and so on, things are never that straightforward. We will simply have to assume the information is encoded in all the latent vectors and we can use vector arithmetic to explore the space.

Before diving into high-dimensional space, let's try to understand it using a two-dimensional example. Imagine you are now at point (0,0) on a map and your home is at (x,y). Therefore, the direction toward your home is (x – 0 ,y...

Summary

We started this chapter by learning how to use an encoder to compress high-dimensional data into low-dimensional latent variables, then use a decoder to reconstruct the data from the latent variables. We learned that the autoencoder's limitation is not being able to guarantee a continuous and uniform latent space, which makes it difficult to sample from. Then we incorporated Gaussian sampling to build a VAE to generate MNIST digits.

Finally, we built a bigger VAE to train on the face dataset and had fun creating and manipulating faces. We learned the importance of the sampling distribution in the latent space, latent space arithmetic, and KLD, which lay the foundation for Chapter 3, Generative Adversarial Network.

Although GANs are more powerful than VAEs in generating photorealistic images, the earlier GANs were difficult to train. Therefore, we will learn about the fundamentals of GANs. By the end of the next chapter, you will have learned the fundamentals of...