Packt+ | Advance your knowledge in tech

You're reading from Deep Learning with Theano

Product type Book

Published in Jul 2017

Publisher Packt

ISBN-13 9781786465825

Pages 300 pages

Edition 1st Edition

Languages

Concepts

Deep Learning

Author (1):

Christopher Bourez

Table of Contents (22) Chapters

Deep Learning with Theano

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Theano Basics

Classifying Handwritten Digits with a Feedforward Network

Encoding Word into Vector

Generating Text with a Recurrent Neural Net

Analyzing Sentiment with a Bidirectional LSTM

Locating with Spatial Transformer Networks

Classifying Images with Residual Networks

Translating and Explaining with Encoding – decoding Networks

Selecting Relevant Inputs or Memories with the Mechanism of Attention

Predicting Times Sequences with Advanced RNN

Learning from the Environment with Reinforcement

Learning Features with Unsupervised Generative Networks

Extending Deep Learning with Theano

Index

Chapter 4. Generating Text with a Recurrent Neural Net

In the previous chapter, you learned how to represent a discrete input into a vector so that neural nets have the power to understand discrete inputs as well as continuous ones.

Many real-world applications involve variable-length inputs, such as connected objects and automation (sort of Kalman filters, much more evolved); natural language processing (understanding, translation, text generation, and image annotation); human behavior reproduction (text handwriting generation and chat bots); and reinforcement learning.

Previous networks, named feedforward networks, are able to classify inputs of fixed dimensions only. To extend their power to variable-length inputs, a new category of networks has been designed: the recurrent neural networks (RNN) that are well suited for machine learning tasks on variable-length inputs or sequences.

Three well-known recurrent neural nets (simple RNN, GRU, and LSTM) are presented for the example of text generation...

Need for RNN

Deep learning networks for natural language is numerical and deals well with multidimensional arrays of floats and integers, as input values. For categorical values, such characters or words, the previous chapter demonstrated a technique known as embedding for transforming them into numerical values as well.

So far, all inputs have been fixed-sized arrays. In many applications, such as texts in natural language processing, inputs have one semantic meaning but can be represented by sequences of variable length.

There is a need to deal with variable-length sequences as shown in the following diagram:

Recurrent Neural Networks (RNN) are the answer to variable-length inputs.

Recurrence can be seen as applying a feedforward network more than once at different time steps, with different incoming input data, but with a major difference, the presence of connections to the past, previous time steps, and in one goal, to refine the representation of input through time.

At each time step, the...

A dataset for natural language

As a dataset, any text corpus can be used, such as Wikipedia, web articles, or even with symbols such as code or computer programs, theater plays, and poems; the model will catch and reproduce the different patterns in the data.

In this case, let's use tiny Shakespeare texts to predict new Shakespeare texts or at least, new texts written in a style inspired by Shakespeare; two levels of predictions are possible, but can be handled in the same way:

At the character level: Characters belong to an alphabet that includes punctuation, and given the first few characters, the model predicts the next characters from an alphabet, including spaces to build words and sentences. There is no constraint for the predicted word to belong to a dictionary and the objective of training is to build words and sentences close to real ones.
At the word level: Words belong to a dictionary that includes punctuation, and given the first few words, the model predicts the next word out...

Simple recurrent network

An RNN is a network applied at multiple time steps but with a major difference: a connection to the previous state of layers at previous time steps named hidden states :

This can be written in the following form:

An RNN can be unrolled as a feedforward network applied on the sequence as input and with shared parameters between different time steps.

Input and output's first dimension is time, while next dimensions are for the data dimension inside each step. As seen in the previous chapter, the value at a time step (a word or a character) can be represented either by an index (an integer, 0-dimensional) or a one-hot-encoding vector (1-dimensional). The former representation is more compact in memory. In this case, input and output sequences will be 1-dimensional represented by a vector, with one dimension, the time:

x = T.ivector()
y = T.ivector()

The structure of the training program remains the same as in Chapter 2, Classifying Handwritten Digits with a Feedforward...

Metrics for natural language performance

The Word Error Rate (WER) or Character Error Rate (CER) is equivalent to the designation of the accuracy error for the case of natural language.

Evaluation of language models is usually expressed with perplexity, which is simply:

Training loss comparison

During training, the learning rate might be strong after a certain number of epochs for fine-tuning. Decreasing the learning rate when the loss does not decrease anymore will help during the last steps of training. To decrease the learning rate, we need to define it as an input variable during compilation:

lr = T.scalar('learning_rate')
train_model = theano.function(inputs=[x,y,lr], outputs=cost,updates=updates)

During training, we adjust the learning rate, decreasing it if the training loss is not better:

if (len(train_loss) > 1 and train_loss[-1] > train_loss[-2]):
    learning_rate = learning_rate * 0.5

As a first experiment, let's see the impact of the size of the hidden layer on the training loss for a simple RNN:

More hidden units improve training speed and might be better in the end. To check this, we should run it for more epochs.

Comparing the training of the different network types, in this case, we do not observe any improvement with LSTM and GRU:

This...

Example of predictions

Let's predict a sentence with the generated model:

sentence = [0]
while sentence[-1] != 1:
    pred = predict_model(sentence)[-1]
    sentence.append(pred)
print(" ".join([ index_[w] for w in sentence[1:-1]]))

Note that we take the most probable next word (argmax), while we must, in order to get some randomness, draw the next word following the predicted probabilities.

At 150 epochs, while the model has still not converged entirely with learning our Shakespeare writings, we can play with the predictions, initiating it with a few words, and see the network generate the end of the sentences:

First citizen: A word , i know what a word
How now!
Do you not this asleep , i say upon this?
Sicinius: What, art thou my master?
Well, sir, come.
I have been myself
A most hose, you in thy hour, sir
He shall not this
Pray you, sir
Come, come, you
The crows?
I'll give you
What, ho!
Consider you, sir
No more!
Let us be gone, or your UNKNOWN UNKNOWN, i do me to do
We are not now

From these...

Applications of RNN

This chapter introduced the simple RNN, LSTM, and GRU models. Such models have a wide range of applications in sequence generation or sequence understanding:

Text generation, such as automatic generation of Obama political speech (obama-rnn), for example with a text seed on jobs:
Good afternoon. God bless you. The United States will step up to the cost of a new challenges of the American people that will share the fact that we created the problem. They were attacked and so that they have to say that all the task of the final days of war that I will not be able to get this done. The promise of the men and women who were still going to take out the fact that the American people have fought to make sure that they have to be able to protect our part. It was a chance to stand together to completely look for the commitment to borrow from the American people. And the fact is the men and women in uniform and the millions of our country with the law system that we should be a strong...

You can refer to the following links for more insight:

The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy May 21, 2015 (http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
Understanding LSTM Networks on Christopher Colah's blog's, 2015 (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
Use of LSTM for audio classification: Connectionist Temporal Classification and Deep Speech: Scaling up end-to-end speech recognition (https://arxiv.org/abs/1412.5567)
Handwriting demo at http://www.cs.toronto.edu/~graves/handwriting.html
General Sequence Learning using Recurrent Neural Networks tutorial at https://www.youtube.com/watch?v=VINCQghQRuM
On the difficulty of training Recurrent Neural Networks Razvan Pascanu, Tomas Mikolov, Yoshua Bengio 2012
Recurrent Neural Networks Tutorial:
- Introduction to RNNS
- Implementing RNN with Python, NumPy, and Theano
- Backpropagation through time and vanishing gradients
- Implementing a GRU/LSTM RNN with Python and Theano...

Summary

Recurrent Neural Networks provides the ability to process variable-length inputs and outputs of discrete or continuous data.

While the previous feedforward networks were able to process only one input to one output (one-to-one scheme), recurrent neural nets introduced in this chapter offered the possibility to make conversions between variable-length and fixed-length representations adding new operating schemes for deep learning input/output: one-to-many, many-to-many, or many-to-one.

The range of applications of RNN is wide. For this reason, we'll study them more in depth in the further chapters, in particular how to enhance the predictive power of these three modules or how to combine them to build multi-modal, question-answering, or translation applications.

In particular, in the next chapter, we'll see a practical example using text embedding and recurrent networks for sentiment analysis. This time, there will also be an opportunity to review these recurrence units under another...