Reader small image

You're reading from  Deep Learning with TensorFlow

Product typeBook
Published inApr 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786469786
Edition1st Edition
Languages
Right arrow
Authors (3):
Giancarlo Zaccone
Giancarlo Zaccone
author image
Giancarlo Zaccone

Giancarlo Zaccone has over fifteen years' experience of managing research projects in the scientific and industrial domains. He is a software and systems engineer at the European Space Agency (ESTEC), where he mainly deals with the cybersecurity of satellite navigation systems. Giancarlo holds a master's degree in physics and an advanced master's degree in scientific computing. Giancarlo has already authored the following titles, available from Packt: Python Parallel Programming Cookbook (First Edition), Getting Started with TensorFlow, Deep Learning with TensorFlow (First Edition), and Deep Learning with TensorFlow (Second Edition).
Read more about Giancarlo Zaccone

Md. Rezaul Karim
Md. Rezaul Karim
author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

Ahmed Menshawy
Ahmed Menshawy
author image
Ahmed Menshawy

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Read more about Ahmed Menshawy

View More author details
Right arrow

Recurrent Neural Networks

Deep learning architectures that are used widely nowadays are the so-called Recurrent Neural Networks (RNNs). The basic idea of RNNs is to make use of sequential type information in the input.

These networks are recurrent because they perform the same computations for all the elements of a sequence of inputs, and the output of each element depends, in addition to the current input, from all the previous computations.

RNNs have proved to have excellent performance in problems such as predicting the next character in a text or, similarly, the prediction of the next word sequence in a sentence.

However, they are also used for more complex problems, such as Machine Translation (MT). In this case, the network has as input a sequence of words in a source language, while the output will be the translated input sequence in a target language, finally, other applications of great importance in which...

RNNs basic concepts

Human beings don't start thinking from scratch, human minds have the so-called persistence of memory, namely, the ability to associate the past with recent information. Traditional neural networks, instead, ignore past events. Taking as an example, a movie's scenes classifier, it's not possible that a neural network uses past scenes to classify the current ones.

Trying to solve this problem, RNNs have been developed, in contrast with the Convolutional Neural Networks (CNNs), the RNNs are networks with a loop that allows the information to be persistent.

RNNs process a sequential input one at a time, updating a kind of vector state that contains information about all past elements of the sequence.

The following figure shows a neural network that takes as input a value of Xt, and then outputs an Ot value:

An RNN with its internal loop

St is a network's vector state that can...

RNNs at work

The state vector St is calculated starting from the current input and the state vector in previous time, through the U and W matrices:

f is a nonlinear function such as tanh or ReLU. As you can see, the two terms in the function are added together before being processed by the function itself.

Finally, Ot; is the network output, calculated using the matrix V:

Unfolding an RNN

The next figure shows an unfolded version of an RNN, obtained by unrolling the network structure for the entire input sequence, at different and discrete times. It is immediately clear that it is different from the typical multi-level neural networks, which use different parameters at each level; an RNN uses the same parameters, U, V, W, for each instant of time.

Indeed, RNNs perform the same computation at each instance, on different inputs of the same sequence. Sharing the same parameters, also, an RNN strongly reduces the number of parameters that the network must learn during the training phase, thus also improving the training times.

Regarding this unfolded version, it is evident how through the backpropagation algorithm with only a small change, you can train networks of this type.

In fact, because the parameters are shared for each instant time, the computed gradient depends on the current...

The vanishing gradient problem

In backpropagation algorithm, the weights are adjusted in proportion to the gradient error, and for the way in which the gradients are computed. Let's check the following:

  • If the weights are small, it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. This is often referred to as vanishing gradients.
  • If the weights in this matrix are large it can lead to a situation where the gradient signal is so large that it can cause learning to diverge. This is often referred to as exploding gradients.

The vanishing-exploding gradient problem also afflicts RNNs. In fact, the BPTT rolls out the RNN creating a very deep feed-forward neural network. The impossibility of having a long-term context by the RNN is due precisely to this phenomenon, if the gradient vanishes or explodes within...

LSTM networks

Long Short Term Memory (LSTM) is a special Recurrent Neural Network architecture, which was originally conceived by Hochreiter and Schmidhuber in 1997. This type of neural network has been recently rediscovered in the context of deep learning, because it is free from the problem of vanishing gradients, and offers excellent results and performance. The networks that are LSTM-based are ideal for prediction and classification of temporal sequences, and are replacing many traditional approaches to deep learning.

LSTM is a network that is composed of cells (LSTM blocks) linked to each other. Each LSTM block contains three types of gate: Input gate, Output gate, and Forget gate, respectively, which implement the functions of writing, reading, and resetting on the cell memory. These gates are not binary, but analogical (generally managed by a sigmoidal activation function mapped in the range [0, 1], where...

An image classifier with RNNs

At this point we introduce our implementation of a recurrent model including LSTMs blocks for an image classification problem. The dataset we used is the well known MNIST.

The implemented model is composed of a single LSTM layer followed by a reduce mean operation and a softmax layer, as illustrated in the following figure:

Dataflow in an RNN architecture
The following code computes the mean of elements across dimensions of a tensor and reduces input_tensor along the dimensions given in axis. Unless keep_dims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keep_dims is true, the reduced dimensions are retained with length 1:
tf.reduce_mean(input_tensor, axis=None,
keep_dims=False, name=None, reduction_indices=None)
If axis has no entries, all dimensions are reduced, and a tensor with a single element is returned.
For example:
# 'x' is [[1., 1....

Bidirectional RNNs

Bidirectional RNNs are based on the idea that the output at time t may depend on previous and future elements in the sequence. To realize this, the output of two RNN must be mixed--one executes the process in a direction and the second runs the process in the opposite direction.

The network splits neurons of a regular RNN into two directions, one for positive time direction (forward states), and another for negative time direction (backward states).
By this structure, the output layer can get information from past and future states.

The unrolled architecture of B-RNN is depicted in the following figure:

Unrolled bidirectional RNN

Let's see now, how to implement a B-RNN for an image classification problem. We begin by importing the needed library, notice that rnn and rnn_cell are TensorFlow libraries:

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

The network...

Text prediction

Language computational models based on RNNs are nowadays among the most successful techniques for statistical language modeling. They can be easily applied in a wide range of tasks, including automatic speech recognition and machine translation.

In this section, we'll explore an RNN model on a challenging task of language processing, guessing the next word in a sequence of text.

You'll find a complete reference for this example in the following page:
https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html.

You can download the source code for this example here (official TensorFlow project GitHub page):
https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb.

The files to download are as follows:

  • ptb_word_lm.py: This file contains code to train the model on the PTB dataset
  • reader.py: This file contains code to read the dataset

Here we just present only the main...

Summary

In this chapter, we provided an overview of RNNs. These are a class of neural networks where the connections between the units form direct cycles, thus giving the possibility to manage temporal and sequential data. We have described the LSTM architecture. The basic idea of this architecture is to improve the RNN providing it with an explicit memory.

LSTM networks are equipped with special hidden units, said memory cells, whose behavior is to remember the previous input for a long time. These cells take in input, at each instant of time, the previous state, and the current input of the network. Combining them with the current contents of memory, and deciding by a gating mechanism by other units what to keep and which to delete things from memory, LSTM have proved very useful and effective learning of long-term dependency.

We have therefore implemented two models of neural networks--the LSTM for a classification...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with TensorFlow
Published in: Apr 2017Publisher: PacktISBN-13: 9781786469786
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Giancarlo Zaccone

Giancarlo Zaccone has over fifteen years' experience of managing research projects in the scientific and industrial domains. He is a software and systems engineer at the European Space Agency (ESTEC), where he mainly deals with the cybersecurity of satellite navigation systems. Giancarlo holds a master's degree in physics and an advanced master's degree in scientific computing. Giancarlo has already authored the following titles, available from Packt: Python Parallel Programming Cookbook (First Edition), Getting Started with TensorFlow, Deep Learning with TensorFlow (First Edition), and Deep Learning with TensorFlow (Second Edition).
Read more about Giancarlo Zaccone

author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

author image
Ahmed Menshawy

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Read more about Ahmed Menshawy