Reader small image

You're reading from  Advanced Deep Learning with Keras

Product typeBook
Published inOct 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788629416
Edition1st Edition
Languages
Right arrow
Author (1)
Rowel Atienza
Rowel Atienza
author image
Rowel Atienza

Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution on the field of active gaze tracking for human-robot interaction. Rowel's current research work focuses on AI and computer vision. He dreams on building useful machines that can perceive, understand, and reason. To help make his dreams become real, Rowel has been supported by grants from the Department of Science and Technology (DOST), Samsung Research Philippines, and Commission on Higher Education-Philippine California Advanced Research Institutes (CHED-PCARI).
Read more about Rowel Atienza

Right arrow

Chapter 8. Variational Autoencoders (VAEs)

Similar to Generative Adversarial Networks (GANs) that we've discussed in the previous chapters, Variational Autoencoders (VAEs) [1] belong to the family of generative models. The generator of VAE is able to produce meaningful outputs while navigating its continuous latent space. The possible attributes of the decoder outputs are explored through the latent vector.

In GANs, the focus is on how to arrive at a model that approximates the input distribution. VAEs attempt to model the input distribution from a decodable continuous latent space. This is one of the possible underlying reasons why GANs are able to generate more realistic signals when compared to VAEs. For example, in image generation, GANs are able to produce more realistic looking images while VAEs in comparison generate images that are less sharp.

Within VAEs, the focus is on the variational inference of latent codes. Therefore, VAEs provide a suitable framework...

Principles of VAEs

In a generative model, we're often interested in approximating the true distribution of our inputs using neural networks:

Principles of VAEs

(Equation 8.1.1)

In the preceding equation,

Principles of VAEs

are the parameters determined during training. For example, in the context of the celebrity faces dataset, this is equivalent to finding a distribution that can draw faces. Similarly, in the MNIST dataset, this distribution can generate recognizable handwritten digits.

In machine learning, to perform a certain level of inference, we're interested in finding

Principles of VAEs

, a joint distribution between inputs, x, and the latent variables, z. The latent variables are not part of the dataset but instead encode certain properties observable from inputs. In the context of celebrity faces, these might be facial expressions, hairstyles, hair color, gender, and so on. In the MNIST dataset, the latent variables may represent the digit and writing styles.

Principles of VAEs

is practically a distribution of input data points...

Conditional VAE (CVAE)

Conditional VAE [2] is similar to the idea of CGAN. In the context of the MNIST dataset, if the latent space is randomly sampled, VAE has no control over which digit will be generated. CVAE is able to address this problem by including a condition (a one-hot label) of the digit to produce. The condition is imposed on both the encoder and decoder inputs.

Formally, the core equation of VAE in Equation 8.1.10 is modified to include the condition c:

Conditional VAE (CVAE)

(Equation 8.2.1)

Similar to VAEs, Equation 8.2.1 means that if we want to maximize the output conditioned on c,

Conditional VAE (CVAE)

, then the two loss terms must be minimized:

  • Reconstruction loss of the decoder given both the latent vector and the condition.
  • KL loss between the encoder given both the latent vector and the condition and the prior distribution given the condition. Similar to a VAE, we typically choose
    Conditional VAE (CVAE)

    .

Listing 8.2.1, cvae-cnn-mnist-8.2.1.py shows us the Keras code of CVAE using CNN layers. In the code that is highlighted...

Conditional VAE (CVAE)-VAE: VAE with disentangled latent representations

In Chapter 6, Disentangled Representation GANs, the concept, and importance of the disentangled representation of latent codes were discussed. We can recall that a disentangled representation is where single latent units are sensitive to changes in single generative factors while being relatively invariant to changes in other factors [3]. Varying a latent code results to changes in one attribute of the generated output while the rest of the properties remain the same.

In the same chapter, InfoGANs [4] demonstrated to us that in the MNIST dataset, it is possible to control which digit to generate and the tilt and thickness of writing style. Observing the results in the previous section, it can be noticed that the VAE is intrinsically disentangling the latent vector dimensions to a certain extent. For example, looking at digit 8 in Figure 8.2.6, navigating z[1] from top to bottom decreases the width and roundness...

Conclusion

In this chapter, we've covered the principles of variational autoencoders (VAEs). As we learned in the principles of VAEs, they bear a resemblance to GANs in the aspect of both attempt to create synthetic outputs from latent space. However, it can be noticed that the VAE networks are much simpler and easier to train compared to GANs. It's becoming clear how conditional VAE and

Conclusion

-VAE are similar in concept to conditional GAN and disentangled representation GAN respectively.

VAEs have an intrinsic mechanism to disentangle the latent vectors. Therefore, building a

Conclusion

-VAE is straightforward. We should note however that interpretable and disentangled codes are important in building intelligent agents.

In the next chapter, we're going to focus on Reinforcement learning. Without any prior data, an agent learns by interacting with its world. We'll discuss how the agent can be rewarded for correct actions and punished for the wrong ones.

References

  1. Diederik P. Kingma and Max Welling. Auto-encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2013(https://arxiv.org/pdf/1312.6114.pdf).
  2. Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning Structured Output Representation Using Deep Conditional Generative Models. Advances in Neural Information Processing Systems, 2015(http://papers.nips.cc/paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models.pdf).
  3. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE transactions on Pattern Analysis and Machine Intelligence 35.8, 2013: 1798-1828(https://arxiv.org/pdf/1206.5538.pdf).
  4. Xi Chen and others. Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Advances in Neural Information Processing Systems, 2016(http://papers.nips.cc/paper/6399-infogan-interpretable-representation-learning-by-information-maximizing-generative-adversarial-nets.pdf...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Deep Learning with Keras
Published in: Oct 2018Publisher: PacktISBN-13: 9781788629416
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rowel Atienza

Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution on the field of active gaze tracking for human-robot interaction. Rowel's current research work focuses on AI and computer vision. He dreams on building useful machines that can perceive, understand, and reason. To help make his dreams become real, Rowel has been supported by grants from the Department of Science and Technology (DOST), Samsung Research Philippines, and Commission on Higher Education-Philippine California Advanced Research Institutes (CHED-PCARI).
Read more about Rowel Atienza