Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Deep Learning with Theano

You're reading from  Deep Learning with Theano

Product type Book
Published in Jul 2017
Publisher Packt
ISBN-13 9781786465825
Pages 300 pages
Edition 1st Edition
Languages
Author (1):
Christopher Bourez Christopher Bourez
Profile icon Christopher Bourez

Table of Contents (22) Chapters

Deep Learning with Theano
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Theano Basics Classifying Handwritten Digits with a Feedforward Network Encoding Word into Vector Generating Text with a Recurrent Neural Net Analyzing Sentiment with a Bidirectional LSTM Locating with Spatial Transformer Networks Classifying Images with Residual Networks Translating and Explaining with Encoding – decoding Networks Selecting Relevant Inputs or Memories with the Mechanism of Attention Predicting Times Sequences with Advanced RNN Learning from the Environment with Reinforcement Learning Features with Unsupervised Generative Networks Extending Deep Learning with Theano Index

Chapter 4. Generating Text with a Recurrent Neural Net

In the previous chapter, you learned how to represent a discrete input into a vector so that neural nets have the power to understand discrete inputs as well as continuous ones.

Many real-world applications involve variable-length inputs, such as connected objects and automation (sort of Kalman filters, much more evolved); natural language processing (understanding, translation, text generation, and image annotation); human behavior reproduction (text handwriting generation and chat bots); and reinforcement learning.

Previous networks, named feedforward networks, are able to classify inputs of fixed dimensions only. To extend their power to variable-length inputs, a new category of networks has been designed: the recurrent neural networks (RNN) that are well suited for machine learning tasks on variable-length inputs or sequences.

Three well-known recurrent neural nets (simple RNN, GRU, and LSTM) are presented for the example of text generation...

Need for RNN


Deep learning networks for natural language is numerical and deals well with multidimensional arrays of floats and integers, as input values. For categorical values, such characters or words, the previous chapter demonstrated a technique known as embedding for transforming them into numerical values as well.

So far, all inputs have been fixed-sized arrays. In many applications, such as texts in natural language processing, inputs have one semantic meaning but can be represented by sequences of variable length.

There is a need to deal with variable-length sequences as shown in the following diagram:

Recurrent Neural Networks (RNN) are the answer to variable-length inputs.

Recurrence can be seen as applying a feedforward network more than once at different time steps, with different incoming input data, but with a major difference, the presence of connections to the past, previous time steps, and in one goal, to refine the representation of input through time.

At each time step, the...

A dataset for natural language


As a dataset, any text corpus can be used, such as Wikipedia, web articles, or even with symbols such as code or computer programs, theater plays, and poems; the model will catch and reproduce the different patterns in the data.

In this case, let's use tiny Shakespeare texts to predict new Shakespeare texts or at least, new texts written in a style inspired by Shakespeare; two levels of predictions are possible, but can be handled in the same way:

  • At the character level: Characters belong to an alphabet that includes punctuation, and given the first few characters, the model predicts the next characters from an alphabet, including spaces to build words and sentences. There is no constraint for the predicted word to belong to a dictionary and the objective of training is to build words and sentences close to real ones.

  • At the word level: Words belong to a dictionary that includes punctuation, and given the first few words, the model predicts the next word out...

Simple recurrent network


An RNN is a network applied at multiple time steps but with a major difference: a connection to the previous state of layers at previous time steps named hidden states :

This can be written in the following form:

An RNN can be unrolled as a feedforward network applied on the sequence as input and with shared parameters between different time steps.

Input and output's first dimension is time, while next dimensions are for the data dimension inside each step. As seen in the previous chapter, the value at a time step (a word or a character) can be represented either by an index (an integer, 0-dimensional) or a one-hot-encoding vector (1-dimensional). The former representation is more compact in memory. In this case, input and output sequences will be 1-dimensional represented by a vector, with one dimension, the time:

x = T.ivector()
y = T.ivector()

The structure of the training program remains the same as in Chapter 2, Classifying Handwritten Digits with a Feedforward...

Metrics for natural language performance


The Word Error Rate (WER) or Character Error Rate (CER) is equivalent to the designation of the accuracy error for the case of natural language.

Evaluation of language models is usually expressed with perplexity, which is simply:

Training loss comparison


During training, the learning rate might be strong after a certain number of epochs for fine-tuning. Decreasing the learning rate when the loss does not decrease anymore will help during the last steps of training. To decrease the learning rate, we need to define it as an input variable during compilation:

lr = T.scalar('learning_rate')
train_model = theano.function(inputs=[x,y,lr], outputs=cost,updates=updates)

During training, we adjust the learning rate, decreasing it if the training loss is not better:

if (len(train_loss) > 1 and train_loss[-1] > train_loss[-2]):
    learning_rate = learning_rate * 0.5

As a first experiment, let's see the impact of the size of the hidden layer on the training loss for a simple RNN:

More hidden units improve training speed and might be better in the end. To check this, we should run it for more epochs.

Comparing the training of the different network types, in this case, we do not observe any improvement with LSTM and GRU:

This...

Example of predictions


Let's predict a sentence with the generated model:

sentence = [0]
while sentence[-1] != 1:
    pred = predict_model(sentence)[-1]
    sentence.append(pred)
print(" ".join([ index_[w] for w in sentence[1:-1]]))

Note that we take the most probable next word (argmax), while we must, in order to get some randomness, draw the next word following the predicted probabilities.

At 150 epochs, while the model has still not converged entirely with learning our Shakespeare writings, we can play with the predictions, initiating it with a few words, and see the network generate the end of the sentences:

  • First citizen: A word , i know what a word

  • How now!

  • Do you not this asleep , i say upon this?

  • Sicinius: What, art thou my master?

  • Well, sir, come.

  • I have been myself

  • A most hose, you in thy hour, sir

  • He shall not this

  • Pray you, sir

  • Come, come, you

  • The crows?

  • I'll give you

  • What, ho!

  • Consider you, sir

  • No more!

  • Let us be gone, or your UNKNOWN UNKNOWN, i do me to do

  • We are not now

From these...

Applications of RNN


This chapter introduced the simple RNN, LSTM, and GRU models. Such models have a wide range of applications in sequence generation or sequence understanding:

  • Text generation, such as automatic generation of Obama political speech (obama-rnn), for example with a text seed on jobs:

    Good afternoon. God bless you. The United States will step up to the cost of a new challenges of the American people that will share the fact that we created the problem. They were attacked and so that they have to say that all the task of the final days of war that I will not be able to get this done. The promise of the men and women who were still going to take out the fact that the American people have fought to make sure that they have to be able to protect our part. It was a chance to stand together to completely look for the commitment to borrow from the American people. And the fact is the men and women in uniform and the millions of our country with the law system that we should be a strong...

Related articles


You can refer to the following links for more insight:

Summary


Recurrent Neural Networks provides the ability to process variable-length inputs and outputs of discrete or continuous data.

While the previous feedforward networks were able to process only one input to one output (one-to-one scheme), recurrent neural nets introduced in this chapter offered the possibility to make conversions between variable-length and fixed-length representations adding new operating schemes for deep learning input/output: one-to-many, many-to-many, or many-to-one.

The range of applications of RNN is wide. For this reason, we'll study them more in depth in the further chapters, in particular how to enhance the predictive power of these three modules or how to combine them to build multi-modal, question-answering, or translation applications.

In particular, in the next chapter, we'll see a practical example using text embedding and recurrent networks for sentiment analysis. This time, there will also be an opportunity to review these recurrence units under another...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with Theano
Published in: Jul 2017 Publisher: Packt ISBN-13: 9781786465825
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}