You're reading from Deep Learning for Beginners

Product type Book

Published in Sep 2020

Publisher Packt

ISBN-13 9781838640859

Pages 432 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Dr. Pablo Rivas

Restricted Boltzmann Machines

Together, we have seen the power of unsupervised learning and hopefully convinced ourselves that it can be applied to different problems. We will finish the topic of unsupervised learning with an exciting approach known as Restricted Boltzmann Machines (RBMs). When we do not care about having a large number of layers, we can use RBMs to learn from the data and find ways to satisfy an energy function that will produce a model that is robust at representing input data.

This chapter complements Chapter 8, Deep Autoencoders, by introducing the backward-forward nature of RBMs, while contrasting it with the forward-only nature of Autoencoders (AEs). This chapter compares RBMs and AEs in the problem of dimensionality reduction, using MNIST as the case study. Once you are finished with this chapter, you should be able to use an RBM using scikit-learn and...

Introduction to RBMs

RBMs are unsupervised models that can be used in different applications that require rich latent representations. They are usually used in a pipeline with a classification model with the purpose of extracting features from the data. They are based on Boltzmann Machines (BMs), which we discuss next (Hinton, G. E., and Sejnowski, T. J. (1983)).

BMs

A BM can be thought of as an undirected dense graph, as depicted in Figure 10.1:

Figure 10.1 – A BM model

This undirected graph has some neural units that are modeled to be visible, , and a set of neural units that are hidden, . Of course, there could be many more than these. But the point of this model is that all neurons are connected to each other: they all talk among themselves. The training of this model will not be covered here, but essentially it is an iterative process where the input is presented in the visible layers, and every neuron (one at a time) adjusts its connections with other neurons to satisfy...

Learning data representations with RBMs

Now that you know the basic idea behind RBMs, we will use the BernoulliRBM model to learn data representations in an unsupervised manner. As before, we will do this with the MNIST dataset to facilitate comparisons.

For some people, the task of learning representations can be thought of as feature engineering. The latter has an explicability component to the term, while the former does not necessarily require us to prescribe meaning to the learned representations.

In scikit-learn, we can create an instance of the RBM by invoking the following instructions:

from sklearn.neural_network import BernoulliRBM
rbm = BernoulliRBM()

The default parameters in the constructor of the RBM are the following:

n_components=256, which is the number of hidden units, , while the number of visible units, , is inferred from the dimensionality of the input.
learning_rate=0.1 controls the strength of the learning algorithm with respect to updates, and it is recommended...

Comparing RBMs and AEs

Now that we have seen how RBMs perform, a comparison with AEs is in order. To make this comparison fair, we can propose the closest configuration to an RBM that an AE can have; that is, we will have the same number of hidden units (neurons in the encoder layer) and the same number of neurons in the visible layer (the decoder layer), as shown in Figure 10.6:

Figure 10.6 – AE configuration that's comparable to RBM

We can model and train our AE using the tools we covered in Chapter 7, Autoencoders, as follows:

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

inpt_dim = 28*28    # 784 dimensions
ltnt_dim = 100      # 100 components

inpt_vec = Input(shape=(inpt_dim,))
encoder = Dense(ltnt_dim, activation='sigmoid') (inpt_vec)
latent_ncdr = Model(inpt_vec, encoder)
decoder = Dense(inpt_dim, activation='sigmoid') (encoder)
autoencoder = Model(inpt_vec, decoder)

autoencoder.compile(loss='binary_crossentropy...

Summary

This intermediate-level chapter has shown you the basic theory behind how RBMs work and their applications. We paid special attention to a Bernoulli RBM that operates on input data that may follow a Bernoulli-like distribution in order to achieve fast learning and efficient computations. We used the MNIST dataset to showcase how interesting the learned representations are for an RBM, and we visualized the learned weights as well. We concluded by comparing the RBM with a very simple AE and showed that both learned high-quality latent spaces while being fundamentally different models.

At this point, you should be able to implement your own RBM model, visualize its learned components, and see the learned latent space by projecting (transforming) the input data and looking at the hidden layer projections. You should feel confident in using an RBM on large datasets, such as MNIST, and even perform a comparison with an AE.

The next chapter is the beginning of a new group of chapters...

Questions and answers

Why can't we perform data reconstructions with an RBM?

RBMs are fundamentally different to AEs. An RBM aims to optimize an energy function, while an AE aims to optimize a data reconstruction function. Thus, we can't do reconstructions with RBMs. However, this fundamental difference allows for new latent spaces that are interesting and robust.

Can we add more layers to an RBM?

No. Not in the current model presented here. The concept of stacked layers of neurons fits the concept of deep AEs better.

What is so cool about RBMs then?

They are simple. They are fast. They provide rich latent spaces. They have no equal at this point. The closest competitors are AEs.

References

Hinton, G. E., and Sejnowski, T. J. (1983, June). Optimal perceptual inference. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (Vol. 448). IEEE New York.
Brooks, S., Gelman, A., Jones, G., and Meng, X. L. (Eds.). (2011). Handbook of Markov Chain Monte Carlo. CRC press.
Tieleman, T. (2008, July). Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning (pp. 1064-1071).
Yamashita, T., Tanaka, M., Yoshida, E., Yamauchi, Y., and Fujiyoshii, H. (2014, August). To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine. In 2014 22nd International Conference on Pattern Recognition (pp. 1520-1525). IEEE.
Tieleman, T., and Hinton, G. (2009, June). Using fast weights to improve persistent contrastive divergence. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1033-1040).

...