Reader small image

You're reading from  Natural Language Processing with TensorFlow - Second Edition

Product typeBook
Published inJul 2022
Reading LevelIntermediate
PublisherPackt
ISBN-139781838641351
Edition2nd Edition
Languages
Right arrow
Author (1)
Thushan Ganegedara
Thushan Ganegedara
author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

Right arrow

Understanding Long Short-Term Memory Networks

In this chapter, we will discuss the fundamentals behind a more advanced RNN variant known as Long Short-Term Memory Networks (LSTMs). Here, we will focus on understanding the theory behind LSTMs, so we can discuss their implementation in the next chapter. LSTMs are widely used in many sequential tasks (including stock market prediction, language modeling, and machine translation) and have proven to perform better than older sequential models (for example, standard RNNs), especially given the availability of large amounts of data. LSTMs are designed to avoid the problem of the vanishing gradient that we discussed in the previous chapter.

The main practical limitation posed by the vanishing gradient is that it prevents the model from learning long-term dependencies. However, by avoiding the vanishing gradient problem, LSTMs have the ability to store memory for longer than ordinary RNNs (for hundreds of time steps). In contrast to RNNs...

Understanding Long Short-Term Memory Networks

In this section, we will first explain how an LSTM cell operates. We will see that in addition to the hidden states, a gating mechanism is in place to control information flow inside the cell.

Then we will work through a detailed example and see how gates and states help at various stages of the example to achieve desired behaviors, finally leading to the desired output. Finally, we will compare an LSTM against a standard RNN to learn how an LSTM differs from a standard RNN.

What is an LSTM?

LSTMs can be seen as a more complex and capable family of RNNs. Though LSTMs are a complicated beast, the underlying principles of LSTMs are as same as of RNNs; they process a sequence of items by working on one input at a time in a sequential order. An LSTM is mainly composed of five different components:

  • Cell state: This is the internal cell state (that is, memory) of an LSTM cell
  • Hidden state: This is the external hidden...

How LSTMs solve the vanishing gradient problem

As we discussed earlier, even though RNNs are theoretically sound, in practice they suffer from a serious drawback. That is, when Backpropagation Through Time (BPTT) is used, the gradient diminishes quickly, which allows us to propagate the information of only a few time steps. Consequently, we can only store the information of very few time steps, thus possessing only short-term memory. This in turn limits the usefulness of RNNs in real-world sequential tasks.

Often, useful and interesting sequential tasks (such as stock market predictions or language modeling) require the ability to learn and store long-term dependencies. Think of the following example for predicting the next word:

John is a talented student. He is an A-grade student and plays rugby and cricket. All the other students envy ______.

For us, this is a very easy task. The answer would be John. However, for an RNN, this is a difficult task. We are trying to predict...

Improving LSTMs

Having a model backed up by solid foundations does not always guarantee pragmatic success when used in the real world. Natural language is quite complex. Sometimes seasoned writers struggle to produce quality content. So we can’t expect LSTMs to magically output meaningful, well-written content all of a sudden. Having a sophisticated design—allowing for better modeling of long-term dependencies in the data—does help, but we need more techniques during inference to produce better text. Therefore, numerous extensions have been developed to help LSTMs perform better at the prediction stage. Here we will discuss several such improvements: greedy sampling, beam search, using word vectors instead of a one-hot-encoded representation of words, and using bidirectional LSTMs. It is important to note that these optimization techniques are not specific to LSTMs; rather, any sequential model can benefit from them.

Greedy sampling

If we try to always...

Other variants of LSTMs

Though we will mainly focus on the standard LSTM architecture, many variants have emerged that either simplify the complex architecture found in standard LSTMs, produce better performance, or both. We will look at two variants that introduce structural modifications to the cell architecture of LSTMs: peephole connections and GRUs.

Peephole connections

Peephole connections allow gates to see not only the current input and the previous final hidden state, but also the previous cell state. This increases the number of weights in the LSTM cell. Having such connections has been shown to produce better results. The equations would look like these:

Let’s briefly look at how this helps the LSTM perform better. So far, the gates see the current input and final hidden state but not the cell state. However, in this configuration, if the output gate is close to zero, even when the cell state contains information crucial...

Summary

In this chapter, you learned about LSTM networks. First, we discussed what an LSTM is and its high-level architecture. We also delved into the detailed computations that take place in an LSTM and discussed the computations through an example.

We saw that an LSTM is composed mainly of five different things:

  • Cell state: The internal cell state of an LSTM cell
  • Hidden state: The external hidden state used to calculate predictions
  • Input gate: This determines how much of the current input is read into the cell state
  • Forget gate: This determines how much of the previous cell state is sent into the current cell state
  • Output gate: This determines how much of the cell state is output into the hidden state

Having such a complex structure allows LSTMs to capture both short-term and long-term dependencies quite well.

We compared LSTMs to vanilla RNNs and saw that LSTMs are actually capable of learning long-term dependencies as an inherent...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing with TensorFlow - Second Edition
Published in: Jul 2022Publisher: PacktISBN-13: 9781838641351
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara