Packt+ | Advance your knowledge in tech

You're reading from Deep Learning with Hadoop

Product type Book

Published in Feb 2017

Publisher Packt

ISBN-13 9781787124769

Pages 206 pages

Edition 1st Edition

Languages

Java

Concepts

Deep Learning

Author (1):

Dipayan Dev

Table of Contents (16) Chapters

Deep Learning with Hadoop

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Dedication

Preface

Introduction to Deep Learning

Distributed Deep Learning for Large-Scale Data

Convolutional Neural Network

Recurrent Neural Network

Restricted Boltzmann Machines

Autoencoders

Miscellaneous Deep Learning Operations using Hadoop

References

Chapter 4. Recurrent Neural Network

	I think the brain is essentially a computer and consciousness is like a computer program. It will cease to run when the computer is turned off. Theoretically, it could be re-created on a neural network, but that would be very difficult, as it would require all one's memories.
	--Stephen Hawking

To solve every problem, people do not initiate their thinking process from scratch. Our thoughts are non-volatile, and it is persistent just like the Read Only Memory (ROM) of a computer. When we read an article, we understand the meaning of every word from our understanding of earlier words in the sentences.

Let us take a real life example to explain this context a bit more. Let us assume we want to make a classification based on the events happening at every point in a video. As we do not have the information of the earlier events of the video, it would be a cumbersome task for the traditional deep neural networks to find some distinguishing reasons to classify...

What makes recurrent networks distinctive from others?

You might be curious to know the specialty of RNNs. This section of the chapter will discuss these things, and from the next section onwards, we will talk about the building blocks of this type of network.

From Chapter 3 , Convolutional Neural Network, you have probably got a sense of the harsh limitation of convolutional networks and that their APIs are too constrained; the network can only take an input of a fixed-sized vector, and also generates a fixed-sized output. Moreover, these operations are performed through a predefined number of intermediate layers. The primary reason that makes RNNs distinctive from others is their ability to operate over long sequences of vectors, and produce different sequences of vectors as the output.

	"If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs"
	--Alex Lebrun

We show different types of input-output relationships of the neural...

Recurrent neural networks(RNNs)

In this section, we will discuss the architecture of the RNN. We will talk about how time is unfolded for the recurrence relation, and used to perform the computation in RNNs.

Unfolding recurrent computations

This section will explain how unfolding a recurrent relation results in sharing of parameters across a deep network structure, and converts it into a computational model.

Let us consider a simple recurrent form of a dynamical system:

In the preceding equation, s ^(t) represents the state of the system at time t, and θ is the same parameter shared across all the iterations.

This equation is called a recurrent equation, as the computation of s ^(t) requires the value returned by s ^(t-1) , the value of s ^(t-1) will require the value of s ^(t-2) , and so on.

This is a simple representation of a dynamic system for understanding purpose. Let us take one more example, where the dynamic system is driven by an external signal x ^(t) , and produces output y ^(t) :

RNNs...

Backpropagation through time (BPTT)

You have already learnt that the primary requirement of RNNs is to distinctly classify the sequential inputs. The backpropagation of error and gradient descent primarily help to perform these tasks.

In case of feed forward neural networks, backpropagation moves in the backward direction from the final error outputs, weights, and inputs of each hidden layer. Backpropagation assigns the weights responsible for generating the error, by calculating their partial derivatives: where E denotes the error and w is the respective weights. The derivatives are applied on the learning rate, and the gradient decreases to update the weights so as to minimize the error rate.

However, a RNN, without using backpropagation directly, uses an extension of it, termed as backpropagation through time (BPTT). In this section, we will discuss BPTT to explain how the training works for RNNs.

Error computation

The backpropagation through time (BPTT) learning algorithm is a natural...

Long short-term memory

In this section, we will discuss a special unit called Long short-term memory (LSTM), which is integrated into RNN. The main purpose of LSTM is to prevent a significant problem of RNN, called the vanishing gradient problem.

Problem with deep backpropagation with time

Unlike the traditional feed forward network, due to unrolling of a RNN with narrow time steps, the feed forward network generated this way could be aggressively deep. This sometimes makes it extremely difficult to train via backpropagation through the time procedure.

In the first chapter, we discussed the vanishing gradient problem. An unfolded RNN suffers from the vanishing gradient problem of exploding while performing backpropagation through time.

Every state of a RNN depends on its input and its previous output multiplied by the current hidden state vector. The same operations happen to the gradient in the reverse direction during backpropagation through time. The layers and numerous time steps of the...

Bi-directional RNNs

This section of the chapter will discuss the major limitations of RNNs and how bi-directional RNN, a special type of RNN helps to overcome those shortfalls. Bi-directional neural networks, apart from taking inputs from the past, takes the information from the future context for its required prediction.

Shortfalls of RNNs

The computation power of standard or unidirectional RNNs has constraints, as the current state cannot reach its future input information. In many cases, the future input information coming up later becomes extremely useful for sequence prediction. For example, in speech recognition, due to linguistic dependencies, the appropriate interpretation of the voice as a phoneme might depend on the next few spoken words. The same situation might also arise in handwriting recognition.

In some modified versions of RNN, this feature is partially attained by inserting some delay of a certain amount (N) of time steps in the output. This delay helps to capture the future...

Distributed deep RNNs

As you now have an understanding of a RNN, its applications, features, and architecture, we can now move on to discuss how to use this network as distributed architecture. Distributing RNN is not an easy task, and hence, only a few researchers have worked on this in the past. Although the primary concept of data parallelism is similar for all the networks, distributing RNNs among multiple servers requires some brainstorming and a bit tedious work too.

Recently, one work from Google [119] has tried to distribute recurrent networks in many servers in a speech recognition task. In this section, we will discuss this work on distributed RNNs with the help of Hadoop.

Asynchronous stochastic gradient descent (ASGD) can be used for large-scale training of a RNN. ASGD has particularly shown success in sequence discriminative training of the deep neural networks.

A two-layer deep Long short-term memory RNN is used to build the Long short-term memory network. Each Long short-term...

RNNs with Deeplearning4j

Training a RNN is not a simple task, and it can be extremely computationally demanding sometimes. With long sequences of training data involving many time steps, the training, sometimes becomes extremely difficult. As of now, you have got a better theoretical understanding of how and why backpropagation through time is primarily used for training a RNN. In this section, we will consider a practical example of the use of a RNN and its implementation using Deeplearning4j.

We now take an example to give an idea of how to do the sentiment analysis of a movie review dataset using RNN. The main problem statement of this network is to take some raw text of a movie review as input, and classify that movie review as either positive or negative based on the contents present. Each word of the raw review text is converted to vectors using the Word2Vec model, and then fed into a RNN. The example uses a large-scale dataset of raw movie reviews taken from http://ai.stanford.edu...

Summary

RNNs are special compared to other traditional deep neural networks because of their capability to work over long sequences of vectors, and to output different sequences of vectors. RNNs are unfolded over time to work like a feed-forward neural network. The training of RNNs is performed with backpropagation of time, which is an extension of the traditional backpropagation algorithm. A special unit of RNNs, called Long short-term memory, helps to overcome the limitations of the backpropagation of time algorithm.

We also talked about the bidirectional RNN, which is an updated version of the unidirectional RNN. Unidirectional RNNs sometimes fail to predict correctly because of lack of future input information. Later, we discussed distribution of deep RNNs and their implementation with Deeplearning4j. Asynchronous stochastic gradient descent can be used for the training of the distributed RNN. In the next chapter, we will discuss another model of deep neural network, called the Restricted...