Reader small image

You're reading from  Advanced Deep Learning with Python

Product typeBook
Published inDec 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789956177
Edition1st Edition
Languages
Right arrow
Author (1)
Ivan Vasilev
Ivan Vasilev
author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev

Right arrow

Understanding Recurrent Networks

In Chapter 1, The Nuts and Bolts of Neural Networks, and Chapter 2, Understanding Convolutional Networks, we took an in-depth look at the properties of general feedforward networks and their specialized incarnation, Convolutional Neural Networks (CNNs). In this chapter, we'll close this story arc with Recurrent Neural Networks (RNNs). The NN architectures we discussed in the previous chapters take in a fixed-sized input and provide a fixed-sized output. RNNs lift this constraint with their ability to process input sequences of a variable length by defining a recurrent relationship over these sequences (hence the name). If you are familiar with some of the topics that will be discussed in this chapter, you can skip them.

In this chapter, we will cover the following topics:

  • Introduction to RNNs
  • Introducing long short-term memory
  • Introducing...

Introduction to RNNs

RNNs are neural networks that can process sequential data with a variable length. Examples of such data include the words of a sentence or the price of stock at various moments in time. By using the word sequential, we imply that the elements of the sequence are related to each other and that their order matters. For example, if we take a book and randomly shuffle all of the words in it, the text will lose its meaning, even though we'll still know the individual words. Naturally, we can use RNNs to solve tasks that relate to sequential data. Examples of such tasks are language translation, speech recognition, predicting the next element of a time series, and so on.

RNNs get their name because they apply the same function over a sequence recurrently. We can define an RNN as a recurrence relation:

Here, f is a differentiable function, st is a vector of...

Introducing long short-term memory

Hochreiter and Schmidhuber studied the problems of vanishing and exploding gradients extensively and came up with a solution called Long Short-Term Memory (LSTM, https://www.bioinf.jku.at/publications/older/2604.pdf). LSTMs can handle long-term dependencies due to a specially crafted memory cell. In fact, they work so well that most of the current accomplishments in training RNNs on a variety of problems are due to the use of LSTMs. In this section, we'll explore how this memory cell works and how it solves the vanishing gradients issue.

The key idea of LSTM is the cell state, ct (in addition to the hidden RNN state, ht), where the information can only be explicitly written in or removed so that the state stays constant if there is no outside interference. The cell state can only be modified by specific gates, which are a way to let information...

Introducing gated recurrent units

A Gated Recurrent Unit (GRU) is a type of recurrent block that was introduced in 2014 (Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, https://arxiv.org/abs/1406.1078 and Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, https://arxiv.org/abs/1412.3555) as an improvement over LSTM. A GRU unit usually has similar or better performance than an LSTM, but it does so with fewer parameters and operations:

A GRU cell

Similar to the classic RNN, a GRU cell has a single hidden state, ht. You can think of it as a combination of the hidden and cell states of an LSTM. The GRU cell has two gates:

  • An update gate, zt, which combines the input and forget LSTM gates. It decides what information to discard and what new information to include in its place, based on the network input, xt...

Implementing text classification

Let's recap on this chapter so far. We started by implementing an RNN using only numpy. Then, we continued with an LSTM implementation using primitive PyTorch operations. We'll conclude this arc by training the default PyTorch 1.3.1 LSTM implementation for a text classification problem. This example also requires the torchtext 0.4.0 package. Text classification (or categorization) refers to the task of assigning categories (or labels) depending on its contents. Text classification tasks include spam detection, topic labeling, and sentiment analysis. This type of problem is an example of a many-to-one relationship, which we defined in the Introduction to RNNs section.

In this section, we'll implement a sentiment analysis example over the Large Movie Review Dataset (http://ai.stanford.edu/~amaas/data/sentiment/), which consists of...

Summary

In this chapter, we discussed RNNs. First, we started with the RNN and backpropagation through time theory. Then, we implemented an RNN from scratch to solidify our knowledge on the subject. Next, we moved on to more complex LSTM and GRU cells using the same pattern: a theoretical explanation, followed by a practical PyTorch implementation. Finally, we combined our knowledge from Chapter 6, Language Modeling, with the new material from this chapter for a full-featured sentiment analysis task implementation.

In the next chapter, we'll discuss seq2seq models and their variations—an exciting new development in sequence processing.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Deep Learning with Python
Published in: Dec 2019Publisher: PacktISBN-13: 9781789956177
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev