Reader small image

You're reading from  Deep Learning for Beginners

Product typeBook
Published inSep 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838640859
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Pablo Rivas
Dr. Pablo Rivas
author image
Dr. Pablo Rivas

Dr. Pablo Rivas is an assistant professor of computer science at Baylor University in Texas. He worked in industry for a decade as a software engineer before becoming an academic. He is a senior member of the IEEE, ACM, and SIAM. He was formerly at NASA Goddard Space Flight Center performing research. He is an ally of women in technology, a deep learning evangelist, machine learning ethicist, and a proponent of the democratization of machine learning and artificial intelligence in general. He teaches machine learning and deep learning. Dr. Rivas is a published author and all his papers are related to machine learning, computer vision, and machine learning ethics. Dr. Rivas prefers Vim to Emacs and spaces to tabs.
Read more about Dr. Pablo Rivas

Right arrow
Recurrent Neural Networks

This chapter introduces recurrent neural networks, starting with the basic model and moving on to newer recurrent layers that are able to handle internal memory learning to remember, or forget, certain patterns found in datasets. We will begin by showing that recurrent networks are powerful in the case of inferring patterns that are temporal or sequential, and then we will introduce an improvement on the traditional paradigm for a model that has internal memory, which can be applied in both directions in the temporal space.

We will approach the learning task by looking at a sentiment analysis problem as a sequence-to-vector application, and then we will focus on an autoencoder as a vector-to-sequence and sequence-to-sequence model at the same time. By the end of this chapter, you will be able to explain why a long short-term memory model is better than...

Introduction to recurrent neural networks

Recurrent neural networks (RNNs) are based on the early work of Rumelhart (Rumelhart, D. E., et al. (1986)), who was a psychologist who worked closely with Hinton, whom we have already mentioned here several times. The concept is simple, but revolutionary in the area of pattern recognition that uses sequences of data.

A sequence of data is any piece of data that has high correlation in either time or space. Examples include audio sequences and images.

The concept of recurrence in RNNs can be illustrated as shown in the following diagram. If you think of a dense layer of neural units, these can be stimulated using some input at different time steps, . Figures 13.1 (b) and (c) show an RNN with five time steps, . We can see in Figures 13.1 (b) and (c) how the input is accessible to the different time steps, but more importantly, the output of the neural units is also available to the next layer of neurons:

Figure 13.1. Different representations...

Long short-term memory models

Initially proposed by Hochreiter, Long Short-Term Memory Models (LSTMs) gained traction as an improved version of recurrent models [Hochreiter, S., et al. (1997)]. LSTMs promised to alleviate the following problems associated with traditional RNNs:

  • Vanishing gradients
  • Exploding gradients
  • The inability to remember or forget certain aspects of the input sequences

The following diagram shows a very simplified version of an LSTM. In (b), we can see the additional self-loop that is attached to some memory, and in (c), we can observe what the network looks like when unfolded or expanded:

Figure 13.6. Simplified representation of an LSTM

There is much more to the model, but the most essential elements are shown in Figure 13.6. Observe how an LSTM layer receives from the previous time step not only the previous output, but also something called state, which acts as a type of memory. In the diagram, you can see that while the current output and state are available...

Sequence-to-vector models

In the previous section, you technically saw a sequence-to-vector model, which took a sequence (of numbers representing words) and mapped to a vector (of one dimension corresponding to a movie review). However, to appreciate these models further, we will move back to MNIST as the source of input to build a model that will take one MNIST numeral and map it to a latent vector.

Unsupervised model

Let's work in the autoencoder architecture shown in the following diagram. We have studied autoencoders before and now we will use them again since we learned that they are powerful in finding vectorial representations (latent spaces) that are robust and driven by unsupervised learning:

Figure 13.10. LSTM-based autoencoder architecture for MNIST

The goal here is to take an image and find its latent representation, which, in the example of Figure 13.10, would be two dimensions. However, you might be wondering: how can an image be a sequence?

We can interpret an image...

Vector-to-sequence models

If you look back at Figure 10, the vector-to-sequence model would correspond to the decoder funnel shape. The major philosophy is that most models usually can go from large inputs down to rich representations with no problems. However, it is only recently that the machine learning community regained traction in producing sequences from vectors very successfully (Goodfellow, I., et al. (2016)).

You can think of Figure 10 again and the model represented there, which will produce a sequence back from an original sequence. In this section, we will focus on that second part, the decoder, and use it as a vector-to-sequence model. However, before we go there, we will introduce another version of an RNN, a bi-directional LSTM.

Bi-directional LSTM

A Bi-directional LSTM (BiLSTM), simply put, is an LSTM that analyzes a sequence going forward and backward, as shown in Figure 14:

Figure 14. A bi-directional LSTM representation

Consider the following examples of sequences...

Sequence-to-sequence models

A Google Brain scientist (Vinyals, O., et al. (2015)) wrote the following:

"Sequences have become first-class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework, which employs the chain rule to efficiently represent the joint probability of sequences."

This is astoundingly correct because now the applications have grown. Just think about the following sequence-to-sequence project ideas:

  • Document summarization. Input sequence: a document. Output sequence: an abstract.
  • Image super resolution. Input sequence: a low-resolution image. Output sequence: a high-resolution image.
  • Video subtitles. Input sequence: video. Output sequence: text captions.
  • Machine translation. Input sequence: text in source language. Output sequence: text in a target language.

These are exciting...

Ethical implications

With the resurgence of recurrent models and their applicability in capturing temporal information in sequences, there is a risk of finding latent spaces that are not properly being fairly distributed. This can be of higher risk in unsupervised models that operate in data that is not properly curated. If you think about it, the model does not care about the relationships that it finds; it only cares about minimizing a loss function, and therefore if it is trained with magazines or newspapers from the 1950s, it may find spaces where the word "women" may be close (in terms of Euclidean distance) to home labor words such as "broom", "dishes", and "cooking", while the word "man" may be close to all other labor such as "driving", "teaching", "doctor", and "scientist". This is an example of a bias that has been introduced into the latent space (Shin, S., et al. (2020)).

The risk here...

Summary

This advanced chapter showed you how to create RNNs. You learned about LSTMs and its bi-directional implementation, which is one of the most powerful approaches for sequences that can have distant temporal correlations. You also learned to create an LSTM-based sentiment analysis model for the classification of movie reviews. You designed an autoencoder to learn a latent space for MNIST using simple and bi-directional LSTMs and used it both as a vector-to-sequence model and as a sequence-to-sequence model.

At this point, you should feel confident explaining the motivation behind memory in RNNs founded in the need for more robust models. You should feel comfortable coding your own recurrent network using Keras/TensorFlow. Furthermore, you should feel confident implementing both supervised and unsupervised recurrent networks.

LSTMs are great in encoding highly correlated spatial information, such as images, or audio, or text, just like CNNs. However, both CNNs and LSTMs learn very...

Questions and answers

  1. If both CNNs and LSTMs can model spatially correlated data, what makes LSTMs particularly better?

Nothing in general, other than the fact that LSTMs have memory. But in certain applications, such as NLP, where a sentence is discovered sequentially as you go forward and backward, there are references to certain words at the beginning, middle, and end, and multiples at a time. It is easier for BiLSTMs to model that behavior faster than a CNN. A CNN may learn to do that, but it may take longer to do so in comparison.

  1. Does adding more recurrent layers make the network better?

No. It can make things worse. It is recommended to keep it simple to no more than three layers, unless you are a scientist and are experimenting with something new. Otherwise, there should be no more than three recurrent layers in a row in an encoder model.

  1. What other applications are there for LSTMs?

Audio processing and classification; image denoising; image super-resolution; text summarization...

References

  • Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323(6088), 533-536.
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
  • Pennington, J., Socher, R., and Manning, C. D. (October 2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
  • Rivas, P., and Zimmermann, M. (December 2019). Empirical Study of Sentence Embeddings for English Sentences Quality Assessment. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 331-336). IEEE.
  • Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
  • Zhang, Z., Liu, D., Han, J., and Schuller, B. (2017...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning for Beginners
Published in: Sep 2020Publisher: PacktISBN-13: 9781838640859
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dr. Pablo Rivas

Dr. Pablo Rivas is an assistant professor of computer science at Baylor University in Texas. He worked in industry for a decade as a software engineer before becoming an academic. He is a senior member of the IEEE, ACM, and SIAM. He was formerly at NASA Goddard Space Flight Center performing research. He is an ally of women in technology, a deep learning evangelist, machine learning ethicist, and a proponent of the democratization of machine learning and artificial intelligence in general. He teaches machine learning and deep learning. Dr. Rivas is a published author and all his papers are related to machine learning, computer vision, and machine learning ethics. Dr. Rivas prefers Vim to Emacs and spaces to tabs.
Read more about Dr. Pablo Rivas