Reader small image

You're reading from  TensorFlow 2.0 Quick Start Guide

Product typeBook
Published inMar 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789530759
Edition1st Edition
Languages
Right arrow
Author (1)
Tony Holdroyd
Tony Holdroyd
author image
Tony Holdroyd

Tony Holdroyd's first degree, from Durham University, was in maths and physics. He also has technical qualifications, including MCSD, MCSD.net, and SCJP. He holds an MSc in computer science from London University. He was a senior lecturer in computer science and maths in further education, designing and delivering programming courses in many languages, including C, C+, Java, C#, and SQL. His passion for neural networks stems from research he did for his MSc thesis. He has developed numerous machine learning, neural network, and deep learning applications, and has advised in the media industry on deep learning as applied to image and music processing. Tony lives in Gravesend, Kent, UK, with his wife, Sue McCreeth, who is a renowned musician.
Read more about Tony Holdroyd

Right arrow

Recurrent Neural Networks Using TensorFlow 2

One of the main drawbacks with a number of neural network architectures, including ConvNets (CNNs), is that they do not allow for sequential data to be processed. In other words, a complete feature, for example, an image, has to be presented all at once. So the input is a fixed length tensor, and the output has to be a fixed length tensor. Neither do the output values of previous features affect the current feature in any way. Also, all of the input values (and output values) are taken to be independent of one another. For example, in our fashion_mnist model (Chapter 4, Supervised Machine Learning Using TensorFlow 2), each input fashion image is independent of, and totally ignorant of, previous images.

Recurrent Neural Networks (RNNs) overcome this problem and make a wide range of new applications possible.

In this chapter, we will...

Neural network processing modes

The following diagram illustrates the variety of neural network processing modes:

Rectangles represent tensors, arrows represent functions, red is input, blue is output, and green is the tensor state.

From left to right, we have the following:

  • Plain feed-forward network, fixed-size input, and fixed-size output, for example, image classification
  • Sequence output, for example, image captioning that takes one image and outputs a set of words identifying items in the image
  • Sequence input, for example, sentiment identification (like our IMDb application) where a sentence is classed as being of positive or negative sentiment
  • Both sequence input and output, for example, machine translation where an RNN takes an English sentence and translates it into a French output
  • Synced sequence both input and output, for example, video classification that is like...

Recurrent architectures

Hence, a new architecture is required for handling data that arrives sequentially, and where both or either of its input values and output values are of variable length for example, the words in a sentence in a language translation application. In this case, both the input and output to the model are of varying lengths as in the fourth mode previously. Also, in order to predict subsequent words given the current word, previous words need to be known as well. This new neural network architecture is called an RNN, and it is specifically designed to handle sequential data.

The term recurrent arises because such models perform the same computation on every element of a sequence, where each output is dependent on previous output. Theoretically, each output depends on all of the previous output items, but in practical terms, RNNs are limited to looking back just...

An application of RNNs

In this application, we will see how to create text using a character-based recurrent neural network. It is easy to change the corpus of text that to be used (see the example to follow); here, we will use the novel Great Expectations by Charles Dickens. We will train the network on this text so that, if we give it a character sequence such as thousan, it will produce the next character in the sequence, d. This process can be continued, and longer sequences of text created by calling the model repeatedly on the evolving sequence.

Here is an example of the text created before the model is trained:

Input: 
 'o else is there to inform?”\n\n“Is there no chance person who might identify you in the street?” said\n'
Next Char Predictions: 
 "dUFdZ!mig())'(ZIon“4g&HZ”@\nWGWtlinnqQY*dGJ7ioU'6(vLKL&...

The code for our RNN example

This application is based on one provided by Google under an Apache 2 license.

As usual, we will break the code down into snippets and refer you to the repository for the license and the full working version. Firstly, we have module imports, as follows:

import tensorflow as tf
import numpy as np
import os
import time

Next, we have the download link for the text file.

You can easily change this to any text you wish by specifying the file name in file and the full URL of the file in url:

file='1400-0.txt'
url='https://www.gutenberg.org/files/1400/1400-0.txt' # Great Expectations by Charles Dickens

And then we set up the Keras get_file() utility for that file, shown as follows:

path = tf.keras.utils.get_file(file,url)

Then, we open and read the file and see how long it is, in characters:

text = open(path).read()
print ('Length of text...

Building and instantiating our model

As we have seen previously, one technique for building a model is to pass the required layers into the tf.keras.Sequential() constructor. In this instance, we have three layers: an embedding layer, an RNN layer, and a dense layer.

The first, embedding layer is a lookup table of vectors, one vector for the numeric value of each character. It has the dimension, embedding_dimension. The middle, the recurrent layer is a GRU; its size is recurrent_nn_units. The last layer is a dense output layer of the length vocabulary_length units.

What the model does is look up the embedding, run the GRU for a single time step using the embedding for input, and pass this to the dense layer, which generates logits (log odds) for the next character.

A diagram showing this is as follows:

The code that implements this model is, therefore, as follows:

def build_model...

Using our model to get predictions

To get the predictions from our model, we need to take a sample from the output distribution. This sampling will get us the characters we need from that output distribution (sampling the output distribution is important because taking the argmax of it, as we would normally do, can easily get the model stuck in a loop).

tf.random.categorical does this sampling and tf.squeeze with axis=-1 removes the last dimension of the tensor, prior to displaying the indices.

The signature of tf.random.categorical is as follows:

tf.random.categorical(logits, num_samples, seed=None, name=None, output_dtype=None)

Comparing this with the call, we see that we are taking one sample (of length sequence_length = 100) from the predictions (example_batch_predictions[0]). The extra dimension is then removed, so we can look up the characters corresponding to the sample...

Summary

This concludes our look at RNNs. In this chapter, we first discussed the general principles of RNNs, and then saw how to acquire and prepare some text for use by a model, noting that it is straightforward to use an alternative source of text here. We then saw how to create and instantiate our model. We then trained our model and used it produce text from our starting string, noting that the network has learned that words are units of text and how to spell quite a variety of words, somewhat in the style of the author of the text, with only a couple of non-words.

In the next chapter, we will look at the use of TensorFlow Hub, which is a software library.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
TensorFlow 2.0 Quick Start Guide
Published in: Mar 2019Publisher: PacktISBN-13: 9781789530759
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Tony Holdroyd

Tony Holdroyd's first degree, from Durham University, was in maths and physics. He also has technical qualifications, including MCSD, MCSD.net, and SCJP. He holds an MSc in computer science from London University. He was a senior lecturer in computer science and maths in further education, designing and delivering programming courses in many languages, including C, C+, Java, C#, and SQL. His passion for neural networks stems from research he did for his MSc thesis. He has developed numerous machine learning, neural network, and deep learning applications, and has advised in the media industry on deep learning as applied to image and music processing. Tony lives in Gravesend, Kent, UK, with his wife, Sue McCreeth, who is a renowned musician.
Read more about Tony Holdroyd