You're reading from Deep Learning for Beginners

Product typeBook

Published inSep 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781838640859

Edition1st Edition

Languages

Python

Tools

Keras

Concepts

Deep Learning

Author (1)

Dr. Pablo Rivas

Recurrent Neural Networks

This chapter introduces recurrent neural networks, starting with the basic model and moving on to newer recurrent layers that are able to handle internal memory learning to remember, or forget, certain patterns found in datasets. We will begin by showing that recurrent networks are powerful in the case of inferring patterns that are temporal or sequential, and then we will introduce an improvement on the traditional paradigm for a model that has internal memory, which can be applied in both directions in the temporal space.

We will approach the learning task by looking at a sentiment analysis problem as a sequence-to-vector application, and then we will focus on an autoencoder as a vector-to-sequence and sequence-to-sequence model at the same time. By the end of this chapter, you will be able to explain why a long short-term memory model is better than...

Introduction to recurrent neural networks

Recurrent neural networks (RNNs) are based on the early work of Rumelhart (Rumelhart, D. E., et al. (1986)), who was a psychologist who worked closely with Hinton, whom we have already mentioned here several times. The concept is simple, but revolutionary in the area of pattern recognition that uses sequences of data.

A sequence of data is any piece of data that has high correlation in either time or space. Examples include audio sequences and images.

The concept of recurrence in RNNs can be illustrated as shown in the following diagram. If you think of a dense layer of neural units, these can be stimulated using some input at different time steps, . Figures 13.1 (b) and (c) show an RNN with five time steps, . We can see in Figures 13.1 (b) and (c) how the input is accessible to the different time steps, but more importantly, the output of the neural units is also available to the next layer of neurons:

Figure 13.1. Different representations...

Long short-term memory models

Initially proposed by Hochreiter, Long Short-Term Memory Models (LSTMs) gained traction as an improved version of recurrent models [Hochreiter, S., et al. (1997)]. LSTMs promised to alleviate the following problems associated with traditional RNNs:

Vanishing gradients
Exploding gradients
The inability to remember or forget certain aspects of the input sequences

The following diagram shows a very simplified version of an LSTM. In (b), we can see the additional self-loop that is attached to some memory, and in (c), we can observe what the network looks like when unfolded or expanded:

Figure 13.6. Simplified representation of an LSTM

There is much more to the model, but the most essential elements are shown in Figure 13.6. Observe how an LSTM layer receives from the previous time step not only the previous output, but also something called state, which acts as a type of memory. In the diagram, you can see that while the current output and state are available...

Sequence-to-vector models

In the previous section, you technically saw a sequence-to-vector model, which took a sequence (of numbers representing words) and mapped to a vector (of one dimension corresponding to a movie review). However, to appreciate these models further, we will move back to MNIST as the source of input to build a model that will take one MNIST numeral and map it to a latent vector.

Unsupervised model

Let's work in the autoencoder architecture shown in the following diagram. We have studied autoencoders before and now we will use them again since we learned that they are powerful in finding vectorial representations (latent spaces) that are robust and driven by unsupervised learning:

Figure 13.10. LSTM-based autoencoder architecture for MNIST

The goal here is to take an image and find its latent representation, which, in the example of Figure 13.10, would be two dimensions. However, you might be wondering: how can an image be a sequence?

We can interpret an image...

Vector-to-sequence models

If you look back at Figure 10, the vector-to-sequence model would correspond to the decoder funnel shape. The major philosophy is that most models usually can go from large inputs down to rich representations with no problems. However, it is only recently that the machine learning community regained traction in producing sequences from vectors very successfully (Goodfellow, I., et al. (2016)).

You can think of Figure 10 again and the model represented there, which will produce a sequence back from an original sequence. In this section, we will focus on that second part, the decoder, and use it as a vector-to-sequence model. However, before we go there, we will introduce another version of an RNN, a bi-directional LSTM.

Bi-directional LSTM

A Bi-directional LSTM (BiLSTM), simply put, is an LSTM that analyzes a sequence going forward and backward, as shown in Figure 14:

Figure 14. A bi-directional LSTM representation

Consider the following examples of sequences...

Sequence-to-sequence models

A Google Brain scientist (Vinyals, O., et al. (2015)) wrote the following:

"Sequences have become first-class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework, which employs the chain rule to efficiently represent the joint probability of sequences."

This is astoundingly correct because now the applications have grown. Just think about the following sequence-to-sequence project ideas:

Document summarization. Input sequence: a document. Output sequence: an abstract.
Image super resolution. Input sequence: a low-resolution image. Output sequence: a high-resolution image.
Video subtitles. Input sequence: video. Output sequence: text captions.
Machine translation. Input sequence: text in source language. Output sequence: text in a target language.

These are exciting...

Ethical implications

With the resurgence of recurrent models and their applicability in capturing temporal information in sequences, there is a risk of finding latent spaces that are not properly being fairly distributed. This can be of higher risk in unsupervised models that operate in data that is not properly curated. If you think about it, the model does not care about the relationships that it finds; it only cares about minimizing a loss function, and therefore if it is trained with magazines or newspapers from the 1950s, it may find spaces where the word "women" may be close (in terms of Euclidean distance) to home labor words such as "broom", "dishes", and "cooking", while the word "man" may be close to all other labor such as "driving", "teaching", "doctor", and "scientist". This is an example of a bias that has been introduced into the latent space (Shin, S., et al. (2020)).

The risk here...

Summary

This advanced chapter showed you how to create RNNs. You learned about LSTMs and its bi-directional implementation, which is one of the most powerful approaches for sequences that can have distant temporal correlations. You also learned to create an LSTM-based sentiment analysis model for the classification of movie reviews. You designed an autoencoder to learn a latent space for MNIST using simple and bi-directional LSTMs and used it both as a vector-to-sequence model and as a sequence-to-sequence model.

At this point, you should feel confident explaining the motivation behind memory in RNNs founded in the need for more robust models. You should feel comfortable coding your own recurrent network using Keras/TensorFlow. Furthermore, you should feel confident implementing both supervised and unsupervised recurrent networks.

LSTMs are great in encoding highly correlated spatial information, such as images, or audio, or text, just like CNNs. However, both CNNs and LSTMs learn very...

Questions and answers

If both CNNs and LSTMs can model spatially correlated data, what makes LSTMs particularly better?

Nothing in general, other than the fact that LSTMs have memory. But in certain applications, such as NLP, where a sentence is discovered sequentially as you go forward and backward, there are references to certain words at the beginning, middle, and end, and multiples at a time. It is easier for BiLSTMs to model that behavior faster than a CNN. A CNN may learn to do that, but it may take longer to do so in comparison.

Does adding more recurrent layers make the network better?

No. It can make things worse. It is recommended to keep it simple to no more than three layers, unless you are a scientist and are experimenting with something new. Otherwise, there should be no more than three recurrent layers in a row in an encoder model.

What other applications are there for LSTMs?

Audio processing and classification; image denoising; image super-resolution; text summarization...

References

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323(6088), 533-536.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
Pennington, J., Socher, R., and Manning, C. D. (October 2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
Rivas, P., and Zimmermann, M. (December 2019). Empirical Study of Sentence Embeddings for English Sentences Quality Assessment. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 331-336). IEEE.
Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Zhang, Z., Liu, D., Han, J., and Schuller, B. (2017...

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning for Beginners

Published in: Sep 2020Publisher: PacktISBN-13: 9781838640859

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dr. Pablo Rivas

Dr. Pablo Rivas is an assistant professor of computer science at Baylor University in Texas. He worked in industry for a decade as a software engineer before becoming an academic. He is a senior member of the IEEE, ACM, and SIAM. He was formerly at NASA Goddard Space Flight Center performing research. He is an ally of women in technology, a deep learning evangelist, machine learning ethicist, and a proponent of the democratization of machine learning and artificial intelligence in general. He teaches machine learning and deep learning. Dr. Rivas is a published author and all his papers are related to machine learning, computer vision, and machine learning ethics. Dr. Rivas prefers Vim to Emacs and spaces to tabs.
Read more about Dr. Pablo Rivas

Other recommended products

Related to this chapter

Machine Learning for Healthcare Analytics Projects

Machine Learning in the healthcare domain is booming because of its abilities to provide accurate and stabilized techniques. This book is packed with new methodologies to create efficient solutions for healthcare analytics. We will build five end-to-end projects to evaluate the efficiency of AI apps to carry out simple-to-complex healthcare analytics tasks.

BookOct 2018134 pages

Deep Learning with Hadoop

BookFeb 2017206 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Keras Deep Learning Cookbook

This book gives you a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning. It presents a unique problem-solution approach to tackle various problems in training different types of neural networks while taking care of the speed and accuracy of these models

BookOct 2018252 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

TensorFlow 2.0 Computer Vision Cookbook

This book covers recipes for solving various computer vision tasks using TensorFlow, taking you through all the tips and tricks you need to overcome any challenges that you may face while building various computer vision applications. You will discover machine learning techniques to solve problems in image processing, feature extraction, and more.

BookFeb 2021542 pages

Neural Network Projects with Python

This book contains practical implementations of several deep learning projects in multiple domains, including in regression-based tasks such as taxi fare prediction in New York City, image classification of cats and dogs using a convolutional neural network, implementing a facial recognition security system using Siamese Neural Networks, and more.

BookFeb 2019308 pages

Advanced Deep Learning with TensorFlow 2 and Keras

A second edition of the bestselling guide to exploring and mastering deep learning with Keras, updated to include TensorFlow 2.x with new chapters on object detection, semantic segmentation, and unsupervised learning using mutual information.

BookFeb 2020512 pages

Advanced Deep Learning with R

This book will help readers to apply deep learning algorithms in R using advanced examples. You will cover variants of neural network models such as ANN, CNN, RNN, LSTM, and more using expert techniques. Readers will make use of popular deep learning libraries such as Keras-R, Tensorflow-R, and more to implement AI models.

BookDec 2019352 pages

Advanced Deep Learning with Keras

This book covers advanced deep learning techniques to create successful AI. Using MLPs, CNNs, and RNNs as building blocks to more advanced techniques, you’ll study deep neural network architectures, Autoencoders, Generative Adversarial Networks (GANs), Variational AutoEncoders (VAEs), and Deep Reinforcement Learning (DRL) critical to many cutting-edge AI results.

BookOct 2018368 pages

Deep Learning with Keras

Keras is a high-level neural network library written in Python that runs on top of either Theano or TensorFlow. With this book, you’ll learn the basics of Keras in a highly practical way and understand how this minimal, highly modular framework runs on both CPU and GPU, allowing you to put your ideas into action in the shortest possible time.

BookApr 2017318 pages

Hands-On Computer Vision with TensorFlow 2

Computer vision is achieving a new frontier of capabilities in fields like health, automobile or robotics. This book explores TensorFlow 2, Google's open-source AI framework, and teaches how to leverage deep neural networks for visual tasks. It will help you acquire the insight and skills to be a part of the exciting advances in computer vision.

BookMay 2019372 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages