You're reading from Deep Learning with TensorFlow

Product typeBook

Published inApr 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781786469786

Edition1st Edition

Languages

Python

Tools

TensorFlow Keras

Concepts

Deep Learning

Authors (3):

Giancarlo Zaccone

Md. Rezaul Karim

Ahmed Menshawy

View More author details

Recurrent Neural Networks

Deep learning architectures that are used widely nowadays are the so-called Recurrent Neural Networks (RNNs). The basic idea of RNNs is to make use of sequential type information in the input.

These networks are recurrent because they perform the same computations for all the elements of a sequence of inputs, and the output of each element depends, in addition to the current input, from all the previous computations.

RNNs have proved to have excellent performance in problems such as predicting the next character in a text or, similarly, the prediction of the next word sequence in a sentence.

However, they are also used for more complex problems, such as Machine Translation (MT). In this case, the network has as input a sequence of words in a source language, while the output will be the translated input sequence in a target language, finally, other applications of great importance in which...

RNNs basic concepts

Human beings don't start thinking from scratch, human minds have the so-called persistence of memory, namely, the ability to associate the past with recent information. Traditional neural networks, instead, ignore past events. Taking as an example, a movie's scenes classifier, it's not possible that a neural network uses past scenes to classify the current ones.

Trying to solve this problem, RNNs have been developed, in contrast with the Convolutional Neural Networks (CNNs), the RNNs are networks with a loop that allows the information to be persistent.

RNNs process a sequential input one at a time, updating a kind of vector state that contains information about all past elements of the sequence.

The following figure shows a neural network that takes as input a value of Xt, and then outputs an Ot value:

An RNN with its internal loop

St is a network's vector state that can...

RNNs at work

The state vector St is calculated starting from the current input and the state vector in previous time, through the U and W matrices:

f is a nonlinear function such as tanh or ReLU. As you can see, the two terms in the function are added together before being processed by the function itself.

Finally, Ot; is the network output, calculated using the matrix V:

Unfolding an RNN

The next figure shows an unfolded version of an RNN, obtained by unrolling the network structure for the entire input sequence, at different and discrete times. It is immediately clear that it is different from the typical multi-level neural networks, which use different parameters at each level; an RNN uses the same parameters, U, V, W, for each instant of time.

Indeed, RNNs perform the same computation at each instance, on different inputs of the same sequence. Sharing the same parameters, also, an RNN strongly reduces the number of parameters that the network must learn during the training phase, thus also improving the training times.

Regarding this unfolded version, it is evident how through the backpropagation algorithm with only a small change, you can train networks of this type.

In fact, because the parameters are shared for each instant time, the computed gradient depends on the current...

The vanishing gradient problem

In backpropagation algorithm, the weights are adjusted in proportion to the gradient error, and for the way in which the gradients are computed. Let's check the following:

If the weights are small, it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. This is often referred to as vanishing gradients.
If the weights in this matrix are large it can lead to a situation where the gradient signal is so large that it can cause learning to diverge. This is often referred to as exploding gradients.

The vanishing-exploding gradient problem also afflicts RNNs. In fact, the BPTT rolls out the RNN creating a very deep feed-forward neural network. The impossibility of having a long-term context by the RNN is due precisely to this phenomenon, if the gradient vanishes or explodes within...

LSTM networks

Long Short Term Memory (LSTM) is a special Recurrent Neural Network architecture, which was originally conceived by Hochreiter and Schmidhuber in 1997. This type of neural network has been recently rediscovered in the context of deep learning, because it is free from the problem of vanishing gradients, and offers excellent results and performance. The networks that are LSTM-based are ideal for prediction and classification of temporal sequences, and are replacing many traditional approaches to deep learning.

LSTM is a network that is composed of cells (LSTM blocks) linked to each other. Each LSTM block contains three types of gate: Input gate, Output gate, and Forget gate, respectively, which implement the functions of writing, reading, and resetting on the cell memory. These gates are not binary, but analogical (generally managed by a sigmoidal activation function mapped in the range [0, 1], where...

An image classifier with RNNs

At this point we introduce our implementation of a recurrent model including LSTMs blocks for an image classification problem. The dataset we used is the well known MNIST.

The implemented model is composed of a single LSTM layer followed by a reduce mean operation and a softmax layer, as illustrated in the following figure:

Dataflow in an RNN architecture

The following code computes the mean of elements across dimensions of a tensor and reduces input_tensor along the dimensions given in axis. Unless keep_dims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keep_dims is true, the reduced dimensions are retained with length 1:
tf.reduce_mean(input_tensor, axis=None,
keep_dims=False, name=None, reduction_indices=None)
If axis has no entries, all dimensions are reduced, and a tensor with a single element is returned.
For example:
# 'x' is [[1., 1....

Bidirectional RNNs

Bidirectional RNNs are based on the idea that the output at time t may depend on previous and future elements in the sequence. To realize this, the output of two RNN must be mixed--one executes the process in a direction and the second runs the process in the opposite direction.

The network splits neurons of a regular RNN into two directions, one for positive time direction (forward states), and another for negative time direction (backward states).
By this structure, the output layer can get information from past and future states.

The unrolled architecture of B-RNN is depicted in the following figure:

Unrolled bidirectional RNN

Let's see now, how to implement a B-RNN for an image classification problem. We begin by importing the needed library, notice that rnn and rnn_cell are TensorFlow libraries:

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

The network...

Text prediction

Language computational models based on RNNs are nowadays among the most successful techniques for statistical language modeling. They can be easily applied in a wide range of tasks, including automatic speech recognition and machine translation.

In this section, we'll explore an RNN model on a challenging task of language processing, guessing the next word in a sequence of text.

You'll find a complete reference for this example in the following page:
https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html.

You can download the source code for this example here (official TensorFlow project GitHub page):
https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb.

The files to download are as follows:

ptb_word_lm.py: This file contains code to train the model on the PTB dataset
reader.py: This file contains code to read the dataset

Here we just present only the main...

Summary

In this chapter, we provided an overview of RNNs. These are a class of neural networks where the connections between the units form direct cycles, thus giving the possibility to manage temporal and sequential data. We have described the LSTM architecture. The basic idea of this architecture is to improve the RNN providing it with an explicit memory.

LSTM networks are equipped with special hidden units, said memory cells, whose behavior is to remember the previous input for a long time. These cells take in input, at each instant of time, the previous state, and the current input of the network. Combining them with the current contents of memory, and deciding by a gating mechanism by other units what to keep and which to delete things from memory, LSTM have proved very useful and effective learning of long-term dependency.

We have therefore implemented two models of neural networks--the LSTM for a classification...

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning with TensorFlow

Published in: Apr 2017Publisher: PacktISBN-13: 9781786469786

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Giancarlo Zaccone

Giancarlo Zaccone has over fifteen years' experience of managing research projects in the scientific and industrial domains. He is a software and systems engineer at the European Space Agency (ESTEC), where he mainly deals with the cybersecurity of satellite navigation systems. Giancarlo holds a master's degree in physics and an advanced master's degree in scientific computing. Giancarlo has already authored the following titles, available from Packt: Python Parallel Programming Cookbook (First Edition), Getting Started with TensorFlow, Deep Learning with TensorFlow (First Edition), and Deep Learning with TensorFlow (Second Edition).
Read more about Giancarlo Zaccone

Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

Ahmed Menshawy

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Read more about Ahmed Menshawy

Other recommended products

Related to this chapter

Deep Learning with TensorFlow

This book introduces the core concepts of deep learning. Get implementation and research details on cutting-edge architectures and apply advanced concepts to your own projects. Develop your knowledge of deep neural networks through hands-on model building and examples of real-world data collection.

BookMar 2018484 pages

TensorFlow: Powerful Predictive Analytics with TensorFlow

Predictive analytics discovers hidden patterns from structured and unstructured data for automated decision making in business intelligence. Predictive decisions are becoming a huge trend worldwide, catering to wide industry sectors by predicting which decisions are more likely to give maximum results. TensorFlow, Google’s brainchild, is immensely popular and extensively used for predictive analysis.

BookMar 2018164 pages

Predictive Analytics with TensorFlow

Predictive decisions are becoming a huge trend worldwide, catering to wide industry sectors by predicting which decisions are more likely to give maximum results. Data mining, statistics, and machine learning allow users to discover predictive intelligence by uncovering patterns and showing the relationship between structured and unstructured data. This book will help you build solutions that will make automated decisions. In the end, tune and build your own predictive analytics model with the help of TensorFlow.

BookNov 2017522 pages

Neural Network Programming with Tensorflow

If you’re aware of the buzz surrounding the terms such as machine learning, artificial intelligence or deep learning, you might know what neural networks are. TensorFlow is a popular framework which can be used to implement efficient neural networks and deep learning models. This book will show you how to leverage the power of TensorFlow to train efficient neural networks. You will start with understanding the fundamentals and basic math for neural networks and why TensorFlow is a popular choice of tool for programming neural networks. During the course of the book, you will be working on real-world datasets to get a hands-on understanding of neural network programming. By the end of this book, you will have a fair understanding of how you can leverage the power of TensorFlow to train neural networks of varying complexities, without any hassle. While you are learning about various neural network implementations you will learn the underlying mathematics and linear algebra and how it maps to the appropriate TensorFlow constructs.

BookNov 2017274 pages

Deep Learning By Example

Deep Learning is a subset of Machine Learning and has gained a lot of popularity recently. This book introduces you to the fundamentals of deep learning in a hands-on manner. You will use Tensorflow to train different types of neural networks for tasks related to computer vision, language processing, and other real-world problems.

BookFeb 2018450 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

Caffe2 Quick Start Guide

Caffe2 by Facebook is a popular and relatively lightweight deep learning framework. Caffe2 is known for speed, accuracy and high efficiency in training neural networks. Caffe2 is widely used in mobile apps. This book is a fast paced guide that will teach you how to train and deploy deep learning models with Caffe2 on resource constrained platforms.

BookMay 2019136 pages

Mastering TensorFlow 1.x

We cover advanced deep learning concepts (such as transfer learning, generative adversarial models, and reinforcement learning), and implement them using TensorFlow and Keras. We cover how to build and deploy at scale with distributed models. You will learn to build TensorFlow models using R, Keras, TensorFlow Learn, TensorFlow Slim and Sonnet

BookJan 2018474 pages

TensorFlow 1.x Deep Learning Cookbook

Deep Neural Networks (DNNs) have achieved a lot of success in the field of computer vision, speech recognition, and natural language processing. In this book, you will learn how to efficiently use TensorFlow, Google's open source framework for deep learning, and implement different deep learning networks with easy to follow independent recipes.

BookDec 2017536 pages

Machine Learning with TensorFlow 1.x

TensorFlow 1.x is an open source software library for numerical computation using data flow graphs. This book approaches common commercial machine learning problems using Google’s TensorFlow 1.x library. It covers unique features of the library such as Data Flow Graphs, training, visualization of performance with TensorBoard—all within a context rich with examples, using problems from multiple industries.

BookNov 2017304 pages

Hands-On Deep Learning with TensorFlow

With deep learning going mainstream, making sense of data and getting accurate results using deep networks is possible. Dan Van Boxel is your guide to exploring the possibilities with deep learning; he will enable you to understand data like never before. With the efficiency and simplicity of TensorFlow, you will be able to process your data and gain insights that will change how you look at data.

BookJul 2017174 pages

Intelligent Mobile Projects with TensorFlow

Google TensorFlow is used to train all the models deployed and running on mobile devices. This book covers 10 projects on the implementation of all major AI areas of iOS, Android, and Raspberry Pi: computer vision, speech and language processing, and machine learning.

BookMay 2018404 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages