Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Image Generation with TensorFlow

You're reading from  Hands-On Image Generation with TensorFlow

Product type Book
Published in Dec 2020
Publisher Packt
ISBN-13 9781838826789
Pages 306 pages
Edition 1st Edition
Languages
Author (1):
Soon Yau Cheong Soon Yau Cheong
Profile icon Soon Yau Cheong

Table of Contents (15) Chapters

Preface Section 1: Fundamentals of Image Generation with TensorFlow
Chapter 1: Getting Started with Image Generation Using TensorFlow Chapter 2: Variational Autoencoder Chapter 3: Generative Adversarial Network Section 2: Applications of Deep Generative Models
Chapter 4: Image-to-Image Translation Chapter 5: Style Transfer Chapter 6: AI Painter Section 3: Advanced Deep Generative Techniques
Chapter 7: High Fidelity Face Generation Chapter 8: Self-Attention for Image Generation Chapter 9: Video Synthesis Chapter 10: Road Ahead Other Books You May Enjoy

Chapter 2: Variational Autoencoder

In the previous chapter, we looked at how a computer sees an image as pixels, and we devised a probabilistic model for pixel distribution for image generation. However, this is not the most efficient way to generate an image. Instead of scanning an image pixel by pixel, we first look at the image and try to understand what is inside. For example, a girl is sitting, wearing a hat, and smiling. Then we use that information to draw a portrait. This is how autoencoders work.

In this chapter, we will first learn how to use an autoencoder to encode pixels into latent variables that we can sample from to generate images. Then we will learn how to tweak it to create a more powerful model known as a variational autoencoder (VAE). Finally, we will train our VAE to generate faces and perform face editing. The following topics will be covered in this chapter:

  • Learning latent variables with autoencoders
  • Variational autoencoders
  • Generating faces...

Technical requirements

The Jupyter notebooks and codes can be found at https://github.com/PacktPublishing/Hands-On-Image-Generation-with-TensorFlow-2.0/tree/master/Chapter02.

The notebooks used in this chapter are as follows:

  • ch2_autoencoder.ipynb
  • ch2_vae_mnist.ipynb
  • ch2_vae_faces.ipynb

Learning latent variables with autoencoders

Autoencoders were first introduced in the 1980s, and one of the inventors is Geoffrey Hinton, who is one of the godfathers of modern deep learning. The hypothesis is that there are many redundancies in high-dimensional input space that can be compressed into some low-dimensional variables. There are traditional machine learning techniques such as Principal Component Analysis (PCA) for dimension reduction.

However, in image generation, we will also want to restore the low dimension space into high dimension space. Although the way to do it is quite different, you can think of it like image compression where a raw image is compressed into a file format such as JPEG, which is small and easy to store and transfer. Then the computer can restore the JPEG into pixels that we can see and manipulate. In other words, the raw pixels are compressed into low-dimensional JPEG format and restored to high-dimensional raw pixels for display.

Autoencoders...

Variational autoencoders

In an autoencoder, the decoder samples directly from latent variables. Variational autoencoders (VAEs), which were invented in 2014, differ in that the sampling is taken from a distribution parameterized by the latent variables. To be clear, let's say we have an autoencoder with two latent variables, and we draw samples randomly and get two samples of 0.4 and 1.2. We then send them to the decoder to generate an image.

In a VAE, these samples don't go to the decoder directly. Instead, they are used as a mean and variance of a Gaussian distribution, and we draw samples from this distribution to be sent to the decoder for image generation. As this is one of the most important distributions in machine learning, so let's go over some basics of Gaussian distributions before creating a VAE.

Gaussian distribution

A Gaussian distribution is characterized by two parameters – mean and variance. I think we are all familiar with the different...

Generating faces with VAEs

Now that you have learned the theory of VAEs and have built one for MNIST, it is time to grow up, ditch the toy, and generate some serious stuff. We will use VAE to generate some faces. Let's get started! The code is in ch2_vae_faces.ipynb. There are a few face datasets available for training:

In this exercise, we will only assume the dataset contains RGB images; feel free to use any dataset that suits your needs.

Network architecture

We reuse the MNIST VAE and training pipeline with some modifications given that the dataset is now different from MNIST. Feel free to reduce...

Controlling face attributes

Everything we have done in this chapter serves only one purpose: to prepare us for face editing! This is the climax of this chapter!

Latent space arithmetic

We have talked about the latent space several times now but haven't given it a proper definition. Essentially, it means every possible value of the latent variables. In our VAE, it is a vector of 200 dimensions, or simply 200 variables. As much as we hope each variable has a distinctive semantic meaning to us, such as z[0] is for eyes, z[1] dictates the eye color, and so on, things are never that straightforward. We will simply have to assume the information is encoded in all the latent vectors and we can use vector arithmetic to explore the space.

Before diving into high-dimensional space, let's try to understand it using a two-dimensional example. Imagine you are now at point (0,0) on a map and your home is at (x,y). Therefore, the direction toward your home is (x – 0 ,y...

Summary

We started this chapter by learning how to use an encoder to compress high-dimensional data into low-dimensional latent variables, then use a decoder to reconstruct the data from the latent variables. We learned that the autoencoder's limitation is not being able to guarantee a continuous and uniform latent space, which makes it difficult to sample from. Then we incorporated Gaussian sampling to build a VAE to generate MNIST digits.

Finally, we built a bigger VAE to train on the face dataset and had fun creating and manipulating faces. We learned the importance of the sampling distribution in the latent space, latent space arithmetic, and KLD, which lay the foundation for Chapter 3, Generative Adversarial Network.

Although GANs are more powerful than VAEs in generating photorealistic images, the earlier GANs were difficult to train. Therefore, we will learn about the fundamentals of GANs. By the end of the next chapter, you will have learned the fundamentals of...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Image Generation with TensorFlow
Published in: Dec 2020 Publisher: Packt ISBN-13: 9781838826789
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}