You're reading from Generative Adversarial Networks Projects

Product typeBook

Published inJan 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781789136678

Edition1st Edition

Languages

Python

Tools

TensorFlow Keras

Concepts

Deep Learning

Author (1)

Kailash Ahirwar

The detailed architecture of a GAN

The architecture of a GAN has two basic elements: the generator network and the discriminator network. Each network can be any neural network, such as an Artificial Neural Network (ANN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or a Long Short Term Memory (LSTM). The discriminator has to have fully connected layers with a classifier at the end.

Let's take a closer look at the components of the architecture of a GAN. In this example, we will imagine that we are creating a dummy GAN.

The architecture of the generator

The generator network in our dummy GAN is a simple feed-forward neural network with five layers: an input layer, three hidden layers, and an output layer. Let's take a closer look at the configuration of the generator (dummy) network:

Layer #	Layer name	Configuration
1	Input layer	`input_shape=(batch_size, 100)`, `output_shape=(batch_size, 100)`
2	Dense layer	`neurons=500`, `input_shape=(batch_size, 100)`, `output_shape=(batch_size, 500)`
3	Dense layer	`neurons=500`, `input_shape=(batch_size, 500)`, `output_shape=(batch_size, 500)`
4	Dense layer	`neurons=784`, `input_shape=(batch_size, 500)`, `output_shape=(batch_size, 784)`
5	Reshape layer	`input_shape=(batch_size, 784)`, `output_shape=(batch_size, 28, 28)`

The preceding table shows the configurations of the hidden layers, and also the input and output layers in the network.

The following diagram shows the flow of tensors and the input and output shapes of the tensors for each layer in the generator network:

The architecture of the generator network.

Let's discuss how this feed-forward neural network processes information during forward propagation of the data:

The input layer takes a 100-dimensional vector sampled from a Gaussian (normal) distribution and passes the tensor to the first hidden layer without any modifications.
The three hidden layers are dense layers with 500, 500, and 784 units, respectively. The first hidden layer (a dense layer) converts a tensor of a shape of (batch_size, 100) to a tensor of a shape of (batch_size, 500).

The second dense layer generates a tensor of a shape of (batch_size, 500).
The third hidden layer generates a tensor of a shape of (batch_size, 784).
In the last output layer, this tensor is reshaped from a shape of (batch_size, 784) to a shape of (batch_size, 28, 28). This means that our network will generate a batch of images, where one image will have a shape of (28, 28).

The architecture of the discriminator

The discriminator in our GAN is a feed-forward neural network with five layers, including an input and an output layer, and three dense layers. The discriminator network is a classifier and is slightly different from the generator network. It processes an image and outputs a probability of the image belonging to a particular class.

The following diagram shows the flow of tensors and the input and output shapes of the tensors for each layer in the discriminator network:

The architecture of the discriminator network

Let's discuss how the discriminator processes data in forward propagation during the training of the network:

Initially, it receives an input of a shape of 28x28.
The input layer takes the input tensor, which is a tensor with a shape of (batch_sizex28x28), and passes it to the first hidden layer without any modifications.
Next, the flattening layer flattens the tensor to a 784-dimensional vector, which gets passed to the first hidden (dense) layer. The first and second hidden layers modify this to a 500-dimensional vector.
The last layer is the output layer, which is again a dense layer, with one unit (a neuron) and sigmoid as the activation function. It outputs a single value, either a 0 or a 1. A value of 0 indicates that the provided image is fake, while a value of 1 indicates that the provided image is real.

Important concepts related to GANs

Now that we have understood the architecture of GANs, let's take a look at a brief overview of a few important concepts. We will first look at KL divergence. It is very important to understand JS divergence, which is an important measure to assess the quality of the models. We will then look at the Nash equilibrium, which is a state that we try to achieve during training. Finally, we will look closer at objective functions, which are very important to understand in order to implement GANs well.

Kullback-Leibler divergence

Kullback-Leibler divergence (KL divergence), also known as relative entropy, is a method used to identify the similarity between two probability distributions. It measures how one probability distribution p diverges from a second expected probability distribution q.

The equation used to calculate the KL divergence between two probability distributions p(x) and q(x) is as follows:

The KL divergence will be zero, or minimum, when p(x) is equal to q(x) at every other point.

Due to the asymmetric nature of KL divergence, we shouldn't use it to measure the distance between two probability distributions. It is therefore should not be used as a distance metric.

Jensen-Shannon divergence

The Jensen-Shannon divergence (also called the information radius (IRaD) or the total divergence to the average) is another measure of similarity between two probability distributions. It is based on KL divergence. Unlike KL divergence, however, JS divergence is symmetric in nature and can be used to measure the distance between two probability distributions. If we take the square root of the Jensen-Shannon divergence, we get the Jensen-Shannon distance, so it is therefore a distance metric.

The following equation represents the Jensen-Shannon divergence between two probability distributions, p and q:

In the preceding equation, (p+q) is the midpoint measure, while is the Kullback-Leibler divergence.

Now that we have learned about the KL divergence and the Jenson-Shannon divergence, let's discuss the Nash equilibrium for GANs.

Nash equilibrium

The Nash equilibrium describes a particular state in game theory. This state can be achieved in a non-cooperative game in which each player tries to pick the best possible strategy to gain the best possible outcome for themselves, based on what they expect the other players to do. Eventually, all the players reach a point at which they have all picked the best possible strategy for themselves based on the decisions made by the other players. At this point in the game, they would gain no benefit from changing their strategy. This state is the Nash equilibrium.

A famous example of how the Nash equilibrium can be reached is with the Prisoner's Dilemma. In this example, two criminals (A and B) have been arrested for committing a crime. Both have been placed in separate cells with no way of communicating with each other. The prosecutor only has enough evidence to convict them for a smaller offense and not the principal crime, which would see them go to jail for a long time. To get a conviction, the prosecutor gives them an offer:

If A and B both implicate each other in the principal crime, they both serve 2 years in jail.
If A implicates B but B remains silent, A will be set free and B will serve 3 years in jail (and vice versa).
If A and B both keep quiet, they both serve only 1 year in jail on the lesser charge.

From these three scenarios, it is obvious that the best possible outcome for A and B is to keep quiet and serve 1 year in jail. However, the risk of keeping quiet is 3 years as neither A nor B have any way of knowing that the other will also keep quiet. Thus, they would reach a state where their actual optimum strategy would be to confess as it is the choice that provides the highest reward and lowest penalty. When this state has been reached, neither criminal would gain any advantage by changing their strategy; thus, they would have reached a Nash equilibrium.

Objective functions

To create a generator network that generates images that are similar to real images, we try to increase the similarity of the data generated by the generator to real data. To measure the similarity, we use objective functions. Both networks have their own objective functions and during the training, they try to minimize their respective objective functions. The following equation represents the final objective function for GANs:

In the preceding equation, is the discriminator model, is the generator model, is the real data distribution, is the distribution of the data generated by the generator, and is the expected output.

During training, D (the Discriminator) wants to maximize the whole output and G (the Generator) wants to minimize it, thereby training a GAN to reach to an equilibrium between the generator and discriminator network. When it reaches an equilibrium, we say that the model has converged. This equilibrium is the Nash equilibrium. Once the training is complete, we get a generator model that is capable of generating realistic-looking images.

Scoring algorithms

Calculating the accuracy of a GAN is simple. The objective function for GANs is not a specific function, such as mean squared error or cross-entropy. GANs learn objective functions during training. There are many scoring algorithms proposed by researchers to measure how well a model fits. Let's look at some scoring algorithms in detail.

The inception score

The inception score is the most widely used scoring algorithm for GANs. It uses a pre-trained inception V3 network (trained on Imagenet) to extract the features of both generated and real images. It was proposed by Shane Barrat and Rishi Sharma in their paper, A Note on the Inception Score (https://arxiv.org/pdf/1801.01973.pdf). The inception score, or IS for short, measure the quality and the diversity of the generated images. Let's look at the equation for IS:

In the preceding equation, notation x represents a sample, sampled from a distribution. and represent the same concept. is the conditional class distribution, and is the marginal class distribution.

To calculate the inception score, perform the following steps:

Start by sampling N number of images generated by the model, denoted as
Then, construct the marginal class distribution, using the following equation:

Then, calculate the KL divergence and the expected improvement, using the following equation:

Finally, calculate the exponential of the result to give us the inception score.

The quality of the model is good if it has a high inception score. Even though this is an important measure, it has certain problems. For example, it shows a good level of accuracy even when the model generates one image per class, which means the model lacks diversity. To resolve this problem, other performance measures were proposed. We will look at one of these in the following section.

The Fréchet inception distance

To overcome the various shortcomings of the inception Score, the Fréchlet Inception Distance (FID) was proposed by Martin Heusel and others in their paper, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (https://arxiv.org/pdf/1706.08500.pdf).

The equation to calculate the FID score is as follows:

The preceding equation represents the FID score between the real images, x, and the generated images, g. To calculate the FID score, we use the Inception network to extract the feature maps from an intermediate layer in the Inception network. Then, we model a multivariate Gaussian distribution, which learns the distribution of the feature maps. This multivariate Gaussian distribution has a mean of and a covariance of , which we use to calculate the FID score. The lower the FID score, the better the model, and the more able it is to generate more diverse images with higher quality. A perfect generative model will have an FID score of zero. The advantage of using the FID score over the Inception score is that it is robust to noise and that it can easily measure the diversity of the images.

The TensorFlow implementation of FID can be found at the following link: https://www.tensorflow.org/api_docs/python/tf/contrib/gan/eval/frechet_classifier_distance
There are more scoring algorithms available that have been recently proposed by researchers in academia and industry. We won't be covering all of these here. Before reading any further, take a look at another scoring algorithm called the Mode Score, information about which can be found at the following link: https://arxiv.org/pdf/1612.02136.pdf.

You have been reading a chapter from

Generative Adversarial Networks Projects

Published in: Jan 2019Publisher: PacktISBN-13: 9781789136678

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Kailash Ahirwar

Kailash Ahirwar is a machine learning and deep learning enthusiast. He has worked in many areas of Artificial Intelligence (AI), ranging from natural language processing and computer vision to generative modeling using GANs. He is a co-founder and CTO of Mate Labs. He uses GANs to build different models, such as turning paintings into photos and controlling deep image synthesis with texture patches. He is super optimistic about AGI and believes that AI is going to be the workhorse of human evolution.
Read more about Kailash Ahirwar

Other recommended products

Related to this chapter

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Hands-On Generative Adversarial Networks with Keras

This book will explore deep learning and generative models, and their applications in artificial intelligence. You will learn to evaluate and improve your GAN models by eliminating challenges that are encountered in real-world applications. You will implement GAN architectures in various domains such as computer vision, NLP, and audio processing

BookMay 2019272 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Advanced Deep Learning with Keras

This book covers advanced deep learning techniques to create successful AI. Using MLPs, CNNs, and RNNs as building blocks to more advanced techniques, you’ll study deep neural network architectures, Autoencoders, Generative Adversarial Networks (GANs), Variational AutoEncoders (VAEs), and Deep Reinforcement Learning (DRL) critical to many cutting-edge AI results.

BookOct 2018368 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Generative AI with Python and TensorFlow 2

Packed with intriguing real-world projects as well as theory, Generative AI with Python and TensorFlow 2 enables you to leverage artificial intelligence creatively and generate human-like data in the form of speech, text, images, and music.

BookApr 2021488 pages4

Advanced Deep Learning with TensorFlow 2 and Keras

A second edition of the bestselling guide to exploring and mastering deep learning with Keras, updated to include TensorFlow 2.x with new chapters on object detection, semantic segmentation, and unsupervised learning using mutual information.

BookFeb 2020512 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

TensorFlow 2.0 Computer Vision Cookbook

This book covers recipes for solving various computer vision tasks using TensorFlow, taking you through all the tips and tricks you need to overcome any challenges that you may face while building various computer vision applications. You will discover machine learning techniques to solve problems in image processing, feature extraction, and more.

BookFeb 2021542 pages

Mastering PyTorch

Discover the flexibility of the PyTorch library for implementing new algorithms in a scalable and efficient way with this expert guide. This book will show you how to process data with deep learning methodologies using PyTorch 1.x and cover advanced topics such as GANs, Deep RL, and NLP using advanced deep learning techniques.

BookFeb 2021450 pages

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages