Reader small image

You're reading from  Machine Learning with PyTorch and Scikit-Learn

Product typeBook
Published inFeb 2022
PublisherPackt
ISBN-139781801819312
Edition1st Edition
Right arrow
Authors (3):
Sebastian Raschka
Sebastian Raschka
author image
Sebastian Raschka

Sebastian Raschka is an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning research. As Lead AI Educator at Grid AI, Sebastian plans to continue following his passion for helping people get into machine learning and artificial intelligence.
Read more about Sebastian Raschka

Yuxi (Hayden) Liu
Yuxi (Hayden) Liu
author image
Yuxi (Hayden) Liu

Yuxi (Hayden) Liu was a Machine Learning Software Engineer at Google. With a wealth of experience from his tenure as a machine learning scientist, he has applied his expertise across data-driven domains and applied his ML expertise in computational advertising, cybersecurity, and information retrieval. He is the author of a series of influential machine learning books and an education enthusiast. His debut book, also the first edition of Python Machine Learning by Example, ranked the #1 bestseller in Amazon and has been translated into many different languages.
Read more about Yuxi (Hayden) Liu

Vahid Mirjalili
Vahid Mirjalili
author image
Vahid Mirjalili

Vahid Mirjalili is a deep learning researcher focusing on CV applications. Vahid received a Ph.D. degree in both Mechanical Engineering and Computer Science from Michigan State University.
Read more about Vahid Mirjalili

View More author details
Right arrow

Modeling Sequential Data Using Recurrent Neural Networks

In the previous chapter, we focused on convolutional neural networks (CNNs). We covered the building blocks of CNN architectures and how to implement deep CNNs in PyTorch. Finally, you learned how to use CNNs for image classification. In this chapter, we will explore recurrent neural networks (RNNs) and see their application in modeling sequential data.

We will cover the following topics:

  • Introducing sequential data
  • RNNs for modeling sequences
  • Long short-term memory
  • Truncated backpropagation through time
  • Implementing a multilayer RNN for sequence modeling in PyTorch
  • Project one: RNN sentiment analysis of the IMDb movie review dataset
  • Project two: RNN character-level language modeling with LSTM cells, using text data from Jules Verne’s The Mysterious Island
  • Using gradient clipping to avoid exploding gradients

Introducing sequential data

Let’s begin our discussion of RNNs by looking at the nature of sequential data, which is more commonly known as sequence data or sequences. We will look at the unique properties of sequences that make them different from other kinds of data. We will then see how to represent sequential data and explore the various categories of models for sequential data, which are based on the input and output of a model. This will help us to explore the relationship between RNNs and sequences in this chapter.

Modeling sequential data – order matters

What makes sequences unique, compared to other types of data, is that elements in a sequence appear in a certain order and are not independent of each other. Typical machine learning algorithms for supervised learning assume that the input is independent and identically distributed (IID) data, which means that the training examples are mutually independent and have the same underlying distribution. In...

RNNs for modeling sequences

In this section, before we start implementing RNNs in PyTorch, we will discuss the main concepts of RNNs. We will begin by looking at the typical structure of an RNN, which includes a recursive component to model sequence data. Then, we will examine how the neuron activations are computed in a typical RNN. This will create a context for us to discuss the common challenges in training RNNs, and we will then discuss solutions to these challenges, such as LSTM and gated recurrent units (GRUs).

Understanding the dataflow in RNNs

Let’s start with the architecture of an RNN. Figure 15.3 shows the dataflow in a standard feedforward NN and in an RNN side by side for comparison:

Figure 15.3: The dataflow of a standard feedforward NN and an RNN

Both of these networks have only one hidden layer. In this representation, the units are not displayed, but we assume that the input layer (x), hidden layer (h), and output layer (o) are vectors...

Implementing RNNs for sequence modeling in PyTorch

Now that we have covered the underlying theory behind RNNs, we are ready to move on to the more practical portion of this chapter: implementing RNNs in PyTorch. During the rest of this chapter, we will apply RNNs to two common problem tasks:

  1. Sentiment analysis
  2. Language modeling

These two projects, which we will walk through together in the following pages, are both fascinating but also quite involved. Thus, instead of providing the code all at once, we will break the implementation up into several steps and discuss the code in detail. If you like to have a big picture overview and want to see all the code at once before diving into the discussion, take a look at the code implementation first.

Project one – predicting the sentiment of IMDb movie reviews

You may recall from Chapter 8, Applying Machine Learning to Sentiment Analysis, that sentiment analysis is concerned with analyzing the expressed...

Summary

In this chapter, you first learned about the properties of sequences that make them different from other types of data, such as structured data or images. We then covered the foundations of RNNs for sequence modeling. You learned how a basic RNN model works and discussed its limitations with regard to capturing long-term dependencies in sequence data. Next, we covered LSTM cells, which consist of a gating mechanism to reduce the effect of exploding and vanishing gradient problems, which are common in basic RNN models.

After discussing the main concepts behind RNNs, we implemented several RNN models with different recurrent layers using PyTorch. In particular, we implemented an RNN model for sentiment analysis, as well as an RNN model for generating text.

In the next chapter, we will see how we can augment an RNN with an attention mechanism, which helps it with modeling long-range dependencies in translation tasks. Then, we will introduce a new deep learning architecture...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with PyTorch and Scikit-Learn
Published in: Feb 2022Publisher: PacktISBN-13: 9781801819312
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Sebastian Raschka

Sebastian Raschka is an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning research. As Lead AI Educator at Grid AI, Sebastian plans to continue following his passion for helping people get into machine learning and artificial intelligence.
Read more about Sebastian Raschka

author image
Yuxi (Hayden) Liu

Yuxi (Hayden) Liu was a Machine Learning Software Engineer at Google. With a wealth of experience from his tenure as a machine learning scientist, he has applied his expertise across data-driven domains and applied his ML expertise in computational advertising, cybersecurity, and information retrieval. He is the author of a series of influential machine learning books and an education enthusiast. His debut book, also the first edition of Python Machine Learning by Example, ranked the #1 bestseller in Amazon and has been translated into many different languages.
Read more about Yuxi (Hayden) Liu

author image
Vahid Mirjalili

Vahid Mirjalili is a deep learning researcher focusing on CV applications. Vahid received a Ph.D. degree in both Mechanical Engineering and Computer Science from Michigan State University.
Read more about Vahid Mirjalili