You're reading from Hands-On Image Generation with TensorFlow

Product type Book

Published in Dec 2020

Publisher Packt

ISBN-13 9781838826789

Pages 306 pages

Edition 1st Edition

Languages

Python

Concepts

Computer Vision

Author (1):

Soon Yau Cheong

Table of Contents (15) Chapters

Preface

Section 1: Fundamentals of Image Generation with TensorFlow

Chapter 1: Getting Started with Image Generation Using TensorFlow

Chapter 2: Variational Autoencoder

Chapter 3: Generative Adversarial Network

Section 2: Applications of Deep Generative Models

Chapter 4: Image-to-Image Translation

Chapter 5: Style Transfer

Chapter 6: AI Painter

Section 3: Advanced Deep Generative Techniques

Chapter 7: High Fidelity Face Generation

Chapter 8: Self-Attention for Image Generation

Chapter 9: Video Synthesis

Chapter 10: Road Ahead

Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 8: Self-Attention for Image Generation

You may have heard about some popular Natural Language Processing (NLP) models, such as the Transformer, BERT, or GPT-3. They all have one thing in common – they all use an architecture known as a transformer that is made up of self-attention modules.

Self-attention is gaining widespread adoption in computer vision, including classification tasks, which makes it an important topic to master. As we will learn in this chapter, self-attention helps us to capture important features in the image without using deep layers for large effective receptive fields. StyleGAN is great for generating faces, but it will struggle to generate images from ImageNet.

In a way, faces are easy to generate, as eyes, noses, and lips all have similar shapes and are in similar positions across various faces. In contrast, the 1,000 classes of ImageNet contain varied objects (dogs, trucks, fish, and pillows, for instance) and backgrounds. Therefore...

Technical requirements

The Jupyter notebooks can be found here (https://github.com/PacktPublishing/Hands-On-Image-Generation-with-TensorFlow-2.0/tree/master/Chapter08):

ch8_sagan.ipynb
ch8_big_gan.ipynb

Spectral normalization

Spectral normalization is an important method to stabilize GAN training and it has been used in a lot of recent state-of-the-art GANs. Unlike batch normalization or other normalization methods that normalize the activation, spectral normalization normalizes the weights instead. The aim of spectral normalization is to limit the growth of the weights, so the networks adhere to the 1-Lipschitz constraint. This has proved effective in stabilizing GAN training, as we learned in Chapter 3, Generative Adversarial Network.

We will revise WGANs to give us a better understanding of the idea behind spectral normalization. The WGAN discriminator (also known as the critic) needs to keep its prediction to small numbers to meet the 1-Lipschtiz constraint. WGANs do this by naively clipping the weights to the range of [-0.01, 0.01].

This is not a reliable method as we need to fine-tune the clipping range, which is a hyperparameter. It would be nice if there was a systematic...

Self-attention modules

Self-attention modules became popular with the introduction of an NLP model known as the Transformer. In NLP applications such as language translation, the model often needs to read sentences word by word to understand them before producing the output. The neural network used prior to the advent of the Transformer was some variant on the recurrent neural network (RNN), such as long short-term memory (LSTM). The RNN has internal states to remember words as it reads a sentence.

One drawback of that is that when the number of words increases, the gradients for the first words vanish. That is to say, the words at start of the sentence become less important gradually as the RNN reads more words.

The Transformer does things differently. It reads all the words at once and weights the importance of each individual word. Therefore, more attention is given to words that are more important, and hence the name attention. Self-attention is a cornerstone of state-of...

Building a SAGAN

The SAGAN has a simple architecture that looks like DCGAN's. However, it is a class-conditional GAN that uses class labels to both generate and discriminate between images. In the following figure, each image on each row is generated from different class labels:

Figure 8.3 – Images generated by a SAGAN by using different class labels. (Source: A. Brock et al., 2018, "Large Scale GAN Training for High Fidelity Natural Image Synthesis," https://arxiv.org/abs/1809.11096)

In this example, we will use the CIFAR10 dataset, which contains 10 classes of images with a resolution of 32x32. We will deal with the conditioning part later. Now, let's first complete the simplest part – the generator.

Building a SAGAN generator

At a high level, the SAGAN generator doesn't look very different from other GAN generators: it takes noise as input and goes through a dense layer, followed by multiple levels of upsampling...

Implementing BigGAN

The BigGAN is an improved version of the SAGAN. The BigGAN ups the image resolution significantly from 128×128 to 512×512, and it does it without progressive growth of layers! The following are some sample images generated by BigGAN:

Figure 8.5 – Class-conditioned samples generated by BigGAN at 512x512 (Source: A. Brock et al., 2018, "Large Scale GAN Training for High Fidelity Natural Image Synthesis," https://arxiv.org/abs/1809.11096)

BigGAN is considered the state-of-the-art class-conditional GAN. We'll now look into the changes and modify the SAGAN code to make ourselves a BigGAN.

Scaling GANs

Older GANs tend to use small batch sizes as that would produce better-quality images. Now we know that the quality problem was caused by the batch statistics used in batch normalization, and this is addressed by using other normalization techniques. Still, the batch size has remained small as it is physically...

Summary

In this chapter, we learned about an important network architecture known as self-attention. The effectiveness of the convolutional layer is limited by its receptive field, and self-attention helps to capture important features including activations that are spatially-distanced from conventional convolutional layers. We have learned how to write a custom layer to insert into a SAGAN. The SAGAN is a state-of-the-art class-conditional GAN. We also implemented conditional batch normalization to learn different learnable parameters specific to each class. Finally, we looked at the bulked-up version of the SAGAN known as the BigGAN, which trumps SAGAN's performance significantly in terms of both image resolution and class variation.

We have now learned about most, if not all, of the important GANs for image generation. In recent years, two major components have gained popularity in the GAN world – they are AdaIN for the StyleGAN as covered in Chapter 7, High Fidelity...