You're reading from Natural Language Processing with TensorFlow - Second Edition

Product typeBook

Published inJul 2022

Reading LevelIntermediate

PublisherPackt

ISBN-139781838641351

Edition2nd Edition

Languages

Python

Tools

TensorFlow

Concepts

Mobile Application Development

Author (1)

Thushan Ganegedara

Understanding Long Short-Term Memory Networks

In this chapter, we will discuss the fundamentals behind a more advanced RNN variant known as Long Short-Term Memory Networks (LSTMs). Here, we will focus on understanding the theory behind LSTMs, so we can discuss their implementation in the next chapter. LSTMs are widely used in many sequential tasks (including stock market prediction, language modeling, and machine translation) and have proven to perform better than older sequential models (for example, standard RNNs), especially given the availability of large amounts of data. LSTMs are designed to avoid the problem of the vanishing gradient that we discussed in the previous chapter.

The main practical limitation posed by the vanishing gradient is that it prevents the model from learning long-term dependencies. However, by avoiding the vanishing gradient problem, LSTMs have the ability to store memory for longer than ordinary RNNs (for hundreds of time steps). In contrast to RNNs...

Understanding Long Short-Term Memory Networks

In this section, we will first explain how an LSTM cell operates. We will see that in addition to the hidden states, a gating mechanism is in place to control information flow inside the cell.

Then we will work through a detailed example and see how gates and states help at various stages of the example to achieve desired behaviors, finally leading to the desired output. Finally, we will compare an LSTM against a standard RNN to learn how an LSTM differs from a standard RNN.

What is an LSTM?

LSTMs can be seen as a more complex and capable family of RNNs. Though LSTMs are a complicated beast, the underlying principles of LSTMs are as same as of RNNs; they process a sequence of items by working on one input at a time in a sequential order. An LSTM is mainly composed of five different components:

Cell state: This is the internal cell state (that is, memory) of an LSTM cell
Hidden state: This is the external hidden...

How LSTMs solve the vanishing gradient problem

As we discussed earlier, even though RNNs are theoretically sound, in practice they suffer from a serious drawback. That is, when Backpropagation Through Time (BPTT) is used, the gradient diminishes quickly, which allows us to propagate the information of only a few time steps. Consequently, we can only store the information of very few time steps, thus possessing only short-term memory. This in turn limits the usefulness of RNNs in real-world sequential tasks.

Often, useful and interesting sequential tasks (such as stock market predictions or language modeling) require the ability to learn and store long-term dependencies. Think of the following example for predicting the next word:

John is a talented student. He is an A-grade student and plays rugby and cricket. All the other students envy ______.

For us, this is a very easy task. The answer would be John. However, for an RNN, this is a difficult task. We are trying to predict...

Improving LSTMs

Having a model backed up by solid foundations does not always guarantee pragmatic success when used in the real world. Natural language is quite complex. Sometimes seasoned writers struggle to produce quality content. So we can’t expect LSTMs to magically output meaningful, well-written content all of a sudden. Having a sophisticated design—allowing for better modeling of long-term dependencies in the data—does help, but we need more techniques during inference to produce better text. Therefore, numerous extensions have been developed to help LSTMs perform better at the prediction stage. Here we will discuss several such improvements: greedy sampling, beam search, using word vectors instead of a one-hot-encoded representation of words, and using bidirectional LSTMs. It is important to note that these optimization techniques are not specific to LSTMs; rather, any sequential model can benefit from them.

Greedy sampling

If we try to always...

Other variants of LSTMs

Though we will mainly focus on the standard LSTM architecture, many variants have emerged that either simplify the complex architecture found in standard LSTMs, produce better performance, or both. We will look at two variants that introduce structural modifications to the cell architecture of LSTMs: peephole connections and GRUs.

Peephole connections

Peephole connections allow gates to see not only the current input and the previous final hidden state, but also the previous cell state. This increases the number of weights in the LSTM cell. Having such connections has been shown to produce better results. The equations would look like these:

Let’s briefly look at how this helps the LSTM perform better. So far, the gates see the current input and final hidden state but not the cell state. However, in this configuration, if the output gate is close to zero, even when the cell state contains information crucial...

Summary

In this chapter, you learned about LSTM networks. First, we discussed what an LSTM is and its high-level architecture. We also delved into the detailed computations that take place in an LSTM and discussed the computations through an example.

We saw that an LSTM is composed mainly of five different things:

Cell state: The internal cell state of an LSTM cell
Hidden state: The external hidden state used to calculate predictions
Input gate: This determines how much of the current input is read into the cell state
Forget gate: This determines how much of the previous cell state is sent into the current cell state
Output gate: This determines how much of the cell state is output into the hidden state

Having such a complex structure allows LSTMs to capture both short-term and long-term dependencies quite well.

We compared LSTMs to vanilla RNNs and saw that LSTMs are actually capable of learning long-term dependencies as an inherent...

The rest of the chapter is locked

You have been reading a chapter from

Natural Language Processing with TensorFlow - Second Edition

Published in: Jul 2022Publisher: PacktISBN-13: 9781838641351

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages