Reader small image

You're reading from  Natural Language Processing with TensorFlow

Product typeBook
Published inMay 2018
Reading LevelBeginner
PublisherPackt
ISBN-139781788478311
Edition1st Edition
Languages
Right arrow
Authors (2):
Thushan Ganegedara
Thushan Ganegedara
author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

View More author details
Right arrow

Chapter 7. Long Short-Term Memory Networks

In this chapter, we will discuss a more advanced RNN variant known as Long Short-Term Memory Networks (LSTMs). LSTMs are widely used in many sequential tasks (including stock market prediction, language modeling, and machine translation) and have proven to perform better than other sequential models (for example, standard RNNs), especially given the availability of large amounts of data. LSTMs are well-designed to avoid the problem of the vanishing gradient that we discussed in the previous chapter.

The main practical limitation posed by the vanishing gradient is that it prevents the model from learning long-term dependencies. However, by avoiding the vanishing gradient problem, LSTMs have the ability to store memory for longer than ordinary RNNs (for hundreds of time steps). In contrast to those RNNs, which only maintain a single hidden state, LSTMs have many more parameters as well as better control over what memory to store and what to discard...

Understanding Long Short-Term Memory Networks


In this section, we will first explain what happens within an LSTM cell. We will see that in addition to the states, a gating mechanism to control information flow inside the cell is present. Then we will work through a detailed example and see how each gate and states help at various stages of the example to achieve desired behaviors, finally leading to the desired output. Finally, we will compare an LSTM against a standard RNN to learn how an LSTM differs from a standard RNN.

What is an LSTM?

LSTMs can be seen as a fancier family of RNNs. An LSTM is composed mainly of five different things:

  • Cell state: This is the internal cell state (that is, memory) of an LSTM cell

  • Hidden state: This is the external hidden state used to calculate predictions

  • Input gate: This determines how much of the current input is read into the cell state

  • Forget gate: This determines how much of the previous cell state is sent into the current cell state

  • Output gate: This...

How LSTMs solve the vanishing gradient problem


As we discussed earlier, even though RNNs are theoretically sound, in practice they suffer from a serious drawback. That is, when the Backpropagation Through Time (BPTT) is used, the gradient diminishes quickly, which allows us to propagate the information of only a few time steps. Consequently, we can only store information of very few time steps, thus possessing only short-term memory. This in turn limits the usefulness of RNNs in real-world sequential tasks.

Often useful and interesting sequential tasks (such as stock market predictions or language modeling) require the ability to learn and store long-term dependencies. Think of the following example for predicting the next word:

John is a talented student. He is an A-grade student and plays rugby and cricket. All the other students envy ______.

For us, this is a very easy task. The answer would be John. However, for an RNN, this is a difficult task. We are trying to predict an answer which...

Other variants of LSTMs


Though we mainly focus on the standard LSTM architecture, many variants have emerged that either simplify the complex architecture found in standard LSTMs or produce better performance or both. We will look at two variants that introduce structural modifications to the cell architecture of LSTM: peephole connections and GRUs.

Peephole connections

Peephole connections allow gates not only to see the current input and the previous final hidden state but also the previous cell state. This increases the number of weights in the LSTM cell. Having such connections have shown to produce better results. The equations would look like these:

Let's briefly look at how this helps the LSTM perform better. So far, the gates see the current input and final hidden state, but not the cell state. However, in this configuration, if the output gate is close to zero, even when the cell state contains important information crucial for better performance, the final hidden state will be close...

Summary


In this chapter, you learned about LSTM networks. First, we discussed what an LSTM is and its high-level architecture. We also delved into the detailed computations that take place in an LSTM and discussed the computations through an example.

We saw that LSTM is composed mainly of five different things:

  • Cell state: The internal cell state of an LSTM cell

  • Hidden state: The external hidden state used to calculate predictions

  • Input gate: This determines how much of the current input is read into the cell state

  • Forget gate: This determines how much of the previous cell state is sent into the current cell state

  • Output gate: This determines how much of the cell state is output into the hidden state

Having such a complex structure allows LSTMs to capture both short-term and long-term dependencies quite well.

We compared LSTMs to vanilla RNNs and saw that LSTMs are actually capable of learning long-term dependencies as an inherent part of their structure, whereas RNNs can fail to learn long...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing with TensorFlow
Published in: May 2018Publisher: PacktISBN-13: 9781788478311
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara