Reader small image

You're reading from  Deep Learning with Hadoop

Product typeBook
Published inFeb 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787124769
Edition1st Edition
Languages
Right arrow
Author (1)
Dipayan Dev
Dipayan Dev
author image
Dipayan Dev

Dipayan Dev has completed his M.Tech from National Institute of Technology, Silchar with a first class first and is currently working as a software professional in Bengaluru, India. He has extensive knowledge and experience in non-relational database technologies, having primarily worked with large-scale data over the last few years. His core expertise lies in Hadoop Framework. During his postgraduation, Dipayan had built an infinite scalable framework for Hadoop, called Dr. Hadoop, which got published in top-tier SCI-E indexed journal of Springer (http://link.springer.com/article/10.1631/FITEE.1500015). Dr. Hadoop has recently been cited by Goo Wikipedia in their Apache Hadoop article. Apart from that, he registers interest in a wide range of distributed system technologies, such as Redis, Apache Spark, Elasticsearch, Hive, Pig, Riak, and other NoSQL databases. Dipayan has also authored various research papers and book chapters, which are published by IEEE and top-tier Springer Journals. To know more about him, you can also visit his LinkedIn profile https://www.linkedin.com/in/dipayandev.
Read more about Dipayan Dev

Right arrow

Chapter 5.  Restricted Boltzmann Machines

 

"What I cannot create, I do not understand."

 
 --Richard Feynman

So far in this book, we have only discussed the discriminative models. The use of these in deep learning is to model the dependencies of an unobserved variable y on an observed variable x. Mathematically, it is formulated as P(y|x). In this chapter, we will discuss deep generative models to be used in deep learning.

Generative models are models, which when given some hidden parameters, can randomly generate some observable data values out of them. The model works on a joint probability distribution over label sequences and observation.

The generative models are used in machine and deep learning either as an intermediate step to generate a conditional probability density function or modeling observations directly from a probability density function.

Restricted Boltzmann machines (RBMs) are a popular generative model that will be discussed in this chapter. RBMs are basically probabilistic...

Energy-based models


The main goal of deep learning and statistical modeling is to encode the dependencies between variables. By getting an idea of those dependencies, from the values of the known variables, a model can answer questions about the unknown variables.

Energy-based models (EBMs) [120] gather and collect the dependencies by identifying scaler energy, which generally is a measure of compatibility to each configuration of the variable. In EBMs, the predictions are made by setting the value of observed variables and finding the value of the unobserved variables, which minimize the overall energy. Learning in EBMs consists of formulating an energy function, which assigns low energies to the correct values of unobserved variables and higher energies to the incorrect ones. Energy-based learning can be treated as an alternative to probabilistic estimation for classification, decision-making, or prediction tasks.

To give a clear idea about how EBMs work, let us look at a simple example...

Boltzmann machines


Boltzmann machines [122] are a network of symmetrically connected, neuron-like units, which are used for stochastic decisions on the given datasets. Initially, they were introduced to learn the probability distributions over binary vectors. Boltzmann machines possess a simple learning algorithm, which helps them to infer and reach interesting conclusions about input datasets containing binary vectors. The learning algorithm becomes very slow in networks with many layers of feature detectors; however, with one layer of feature detector at a time, learning can be much faster.

To solve a learning problem, Boltzmann machines consist of a set of binary data vectors, and update the weight on the respective connections so that the data vectors turn out to be good solutions for the optimization problem laid by the weights. The Boltzmann machine, to solve the learning problem, makes lots of small updates to these weights.

The Boltzmann machine over a d-dimensional binary vector can...

Restricted Boltzmann machine


The Restricted Boltzmann machine (RBM) is a classic example of building blocks of deep probabilistic models that are used for deep learning. The RBM itself is not a deep model but can be used as a building block to form other deep models. In fact, RBMs are undirected probabilistic graphical models that consist of a layer of observed variables and a single layer of hidden variables, which may be used to learn the representation for the input. In this section, we will explain how the RBM can be used to build many deeper models.

Let us consider two examples to see the use case of RBM. RBM primarily operates on a binary version of factor analysis. Let us say we have a restaurant, and want to ask our customer to rate the food on a scale of 0 to 5. In the traditional approach, we will try to explain each food item and customer in terms of the variable's hidden factors. For example, foods such as pasta and lasagne will have a strong association with the Italian factors...

Convolutional Restricted Boltzmann machines


Very high dimensional inputs, such as images or videos, put immense stress on the memory, computation, and operational requirements of traditional machine learning models. In  Chapter 3 , Convolutional Neural Network, we have shown how replacing the matrix multiplication by discrete convolutional operations with small kernel resolves these problems. Going forward, Desjardins and Bengio [123] have shown that this approach also works fine when applied to RBMs. In this section, we will discuss the functionalities of this model.

Figure 5.7 : Figure shows the observed variables or the visible units of an RBM can be associated with mini batches of image to a compute the final result. The weight connections represents a set of filters

Further, in normal RBMs, the visible units are directly related to all the hidden variables through different parameters and weights. To describe an image in terms of spatially local features ideally needs fewer parameters...

Deep Belief networks


Deep Belief networks (DBNs) were one of the most popular, non-convolutional models that could be successfully deployed as deep neural networks in the year 2006-07 [124] [125]. The renaissance of deep learning probably started from the invention of DBNs back in 2006. Before the introduction of DBNs, it was very difficult to optimize the deep models. By outperforming the Support Vector machines (SVMs), DBNs had shown that deep models can be really successful; although, compared to the other generative or unsupervised learning algorithms, the popularity of DBNs has fallen a bit, and is rarely used these days. However, they still play a very important role in the history of deep learning.

Note

A DBN with only one hidden layer is just an RBM.

DBNs are generative models composed of more than one layer of hidden variables. The hidden variables are generally binary in nature; however the visible units might consist of binary or real values. In DBNs, every unit of each layer is...

Distributed Deep Belief network


DBNs have so far achieved a lot in numerous applications such as speech and phone recognition [127], information retrieval [128], human motion modelling[129], and so on. However, the sequential implementation for both RBM and DBNs come with various limitations. With a large-scale dataset, the models show various shortcomings in their applications due to the long, time consuming computation involved, memory demanding nature of the algorithms, and so on. To work with Big data, RBMs and DBNs require distributed computing to provide scalable, coherent and efficient learning.

To make DBNs acquiescent to the large-scale dataset stored on a cluster of computers, DBNs should acquire a distributed learning approach with Hadoop and Map-Reduce. The paper in [130] has shown a key-value pair approach for each level of an RBM, where the pre-training is accomplished with layer-wise, in a distributed environment in Map-Reduce framework. The learning is performed on Hadoop...

Implementation using Deeplearning4j


This section of the chapter will provide a basic idea of how to write the code for RBMs and DBNs using Deeplearning4j. Readers will be able to learn the syntax for using the various hyperparameters mentioned in this chapter.

To implement RBMs and DBNs using Deeplearning4j, the whole idea is very simple. The overall implementation can be split into three core phases: loading data or preparation of the data, network configuration, and training and evaluation of the model.

In this section, we will first discuss RBMs on IrisDataSet, and then we will come to the implementation of DBNs.

Restricted Boltzmann machines

For the building and training of RBMs, first we need to define and initialize the hyperparameter needed for the model:

Nd4j.MAX_SLICES_TO_PRINT = -1;       
Nd4j.MAX_ELEMENTS_PER_SLICE = -1;       
Nd4j.ENFORCE_NUMERICAL_STABILITY = true;       
final int numRows = 4;       
final int numColumns = 1;       
int outputNum = 10;       
int numSamples ...

Summary


The RBM is a generative model, which can randomly produce visible data values when some latent or hidden parameters are supplied to it. In this chapter, we have discussed the concept and mathematical model of the Boltzmann machine, which is an energy-based model. The chapter then discusses and gives a visual representation of the RBM. Further, this chapter discusses CRBM, which is a combination of Convolution and RBMs to extract the features of high dimensional images. We then moved toward popular DBNs that are nothing but a stacked implementation of RBMs. The chapter further discusses the approach to distribute the training of RBMs as well as DBNs in the Hadoop framework.

We conclude the chapter by providing code samples for both the models. The next chapter of the book will introduce one more generative model called autoencoder and its various forms such as de-noising autoencoder, deep autoencoder, and so on.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with Hadoop
Published in: Feb 2017Publisher: PacktISBN-13: 9781787124769
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dipayan Dev

Dipayan Dev has completed his M.Tech from National Institute of Technology, Silchar with a first class first and is currently working as a software professional in Bengaluru, India. He has extensive knowledge and experience in non-relational database technologies, having primarily worked with large-scale data over the last few years. His core expertise lies in Hadoop Framework. During his postgraduation, Dipayan had built an infinite scalable framework for Hadoop, called Dr. Hadoop, which got published in top-tier SCI-E indexed journal of Springer (http://link.springer.com/article/10.1631/FITEE.1500015). Dr. Hadoop has recently been cited by Goo Wikipedia in their Apache Hadoop article. Apart from that, he registers interest in a wide range of distributed system technologies, such as Redis, Apache Spark, Elasticsearch, Hive, Pig, Riak, and other NoSQL databases. Dipayan has also authored various research papers and book chapters, which are published by IEEE and top-tier Springer Journals. To know more about him, you can also visit his LinkedIn profile https://www.linkedin.com/in/dipayandev.
Read more about Dipayan Dev