Reader small image

You're reading from  Mastering PyTorch

Product typeBook
Published inFeb 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781789614381
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ashish Ranjan Jha
Ashish Ranjan Jha
author image
Ashish Ranjan Jha

Ashish Ranjan Jha received his bachelor's degree in electrical engineering from IIT Roorkee (India), a master's degree in Computer Science from EPFL (Switzerland), and an MBA degree from Quantic School of Business (Washington). He has received a distinction in all 3 of his degrees. He has worked for large technology companies, including Oracle and Sony as well as the more recent tech unicorns such as Revolut, mostly focused on artificial intelligence. He currently works as a machine learning engineer. Ashish has worked on a range of products and projects, from developing an app that uses sensor data to predict the mode of transport to detecting fraud in car damage insurance claims. Besides being an author, machine learning engineer, and data scientist, he also blogs frequently on his personal blog site about the latest research and engineering topics around machine learning.
Read more about Ashish Ranjan Jha

Right arrow

Chapter 4: Deep Recurrent Model Architectures

Neural networks are powerful machine learning tools that are used to help us learn complex patterns between the inputs (X) and outputs (y) of a dataset. In the previous chapter, we discussed convolutional neural networks, which learn a one-to-one mapping between X and y; that is, each input, X, is independent of the other inputs and each output, y, is independent of the other outputs of the dataset.

In this chapter, we will discuss a class of neural networks that can model sequences where X (or y) is not just a single independent data point, but a temporal sequence of data points [X1, X2, .. Xt] (or [y1, y2, .. yt]). Note that X2 (which is the data point at time step 2) is dependent on X1, X3 is dependent on X2 and X1, and so on.

Such networks are classified as recurrent neural networks (RNNs). These networks are capable of modeling the temporal aspect of data by including additional weights in the model that create cycles in the...

Technical requirements

We will be using Jupyter notebooks for all our exercises. The following is a list of Python libraries that must be installed for this chapter using pip; for example, run pip install torch==1.4.0 on the command line:

jupyter==1.0.0 torch==1.4.0 tqdm==4.43.0 matplotlib==3.1.2 torchtext==0.5.0

All the code files that are relevant to this chapter are available at https://github.com/PacktPublishing/Mastering-PyTorch/blob/master/Chapter04.

Exploring the evolution of recurrent networks

Recurrent networks have been around since the 80s. In this section, we will explore the evolution of the recurrent network architecture since its inception. We will discuss and reason about the developments that were made to the architecture by going through the key milestones in the evolution of (RNNs). Before jumping right into the timelines, we'll quickly review the different types of RNNs and how they relate to a general feed-forward neural network.

Types of recurrent neural networks

While most supervised machine learning models model one-to-one relationships, (RNNs) can model the following types of input-output relationships:

  • Many-to-many (instantaneous)

    Example: Named-entity-recognition: Given a sentence/text, tag the words with named entity categories such as names, organizations, locations, and so on.

  • Many-to-many (encoder-decoder)

    Example: Machine translation (say, from English text to German text): Takes in...

Training RNNs for sentiment analysis

In this section, we will train an RNN model using PyTorch for a text classification task – sentiment analysis. In this task, the model takes in a piece of text – a sequence of words – as input and outputs either 1 (meaning positive sentiment) or 0 (negative sentiment). For this binary classification task involving sequential data, we will use a unidirectional single-layer RNN.

Before training the model, we will manually process the textual data and convert it into a usable numeric form. Upon training the model, we will test it on some sample texts. We will demonstrate the use of various PyTorch functionalities to efficiently perform this task. The code for this exercise can be found at https://github.com/PacktPublishing/Mastering-PyTorch/blob/master/Chapter04/rnn.ipynb.

Loading and preprocessing the text dataset

For this exercise, we will need to import a few dependencies:

  1. First, execute the following import...

Building a bidirectional LSTM

So far, we have trained and tested a simple RNN model on the sentiment analysis task, which is a binary classification task based on textual data. In this section, we will try to improve our performance on the same task by using a more advanced recurrent architecture – LSTMs.

LSTMs, as we know, are more capable of handling longer sequences due to their memory cell gates, which help retain important information from several time steps before and forget irrelevant information even if it was recent. With the exploding and vanishing gradients problem in check, LSTMs should be able to perform well when processing long movie reviews.

Moreover, we will be using a bidirectional model as it broadens the context window at any time step for the model to make a more informed decision about the sentiment of the movie review. The RNN model we looked at in the previous exercise overfitted the dataset during training, so to tackle that, we will be using dropouts...

Discussing GRUs and attention-based models

In the final section of this chapter, we will briefly look at GRUs, how they are similar yet different from LSTMs, and how to initialize a GRU model using PyTorch. We will also look at attention-based (RNNs). We will conclude this section by describing how attention-only (no recurrence or convolutions)-based models outperform the recurrent family of neural models when it comes to sequence modeling tasks.

GRUs and PyTorch

As we discussed in the Exploring the evolution of recurrent networks section, GRUs are a type of memory cell with two gates – a reset gate and an update gate, as well as one hidden state vector. In terms of configuration, GRUs are simpler than LSTMs and yet equally effective in dealing with the exploding and vanishing gradients problem. Tons of research has been done to compare the performance of LSTMs and GRUs. While both perform better than the simple RNNs on various sequence-related tasks, one is slightly better...

Summary

In this chapter, we have extensively explored recurrent neural architectures. First, we learned about various RNN types: one-to-many, many-to-many, and so on. We then delved into the history and evolution of RNN architectures. From here, we looked at simple RNNs, LSTMs, and GRUs to bidirectional, multi-dimensional, and stacked models. We also inspected what each of these individual architectures looked like and what was novel about them.

Next, we performed two hands-on exercises on a many-to-one sequence classification task based on sentiment analysis. Using PyTorch, we trained a unidirectional RNN model, followed by a bidirectional LSTM model with dropout on the IMDb movie reviews dataset. In the first exercise, we manually loaded and processed the data. In the second exercise, using PyTorch's torchtext module, we demonstrated how to load the dataset and process the text data, including vocabulary generation, efficiently and concisely.

In the final section of this...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering PyTorch
Published in: Feb 2021Publisher: PacktISBN-13: 9781789614381
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Ranjan Jha

Ashish Ranjan Jha received his bachelor's degree in electrical engineering from IIT Roorkee (India), a master's degree in Computer Science from EPFL (Switzerland), and an MBA degree from Quantic School of Business (Washington). He has received a distinction in all 3 of his degrees. He has worked for large technology companies, including Oracle and Sony as well as the more recent tech unicorns such as Revolut, mostly focused on artificial intelligence. He currently works as a machine learning engineer. Ashish has worked on a range of products and projects, from developing an app that uses sensor data to predict the mode of transport to detecting fraud in car damage insurance claims. Besides being an author, machine learning engineer, and data scientist, he also blogs frequently on his personal blog site about the latest research and engineering topics around machine learning.
Read more about Ashish Ranjan Jha