Reader small image

You're reading from  The Deep Learning Architect's Handbook

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781803243795
Edition1st Edition
Right arrow
Author (1)
Ee Kin Chin
Ee Kin Chin
author image
Ee Kin Chin

Ee Kin Chin is a Senior Deep Learning Engineer at DataRobot. He holds a Bachelor of Engineering (Honours) in Electronics with a major in Telecommunications. Ee Kin is an expert in the field of Deep Learning, Data Science, Machine Learning, Artificial Intelligence, Supervised Learning, Unsupervised Learning, Python, Keras, Pytorch, and related technologies. He has a proven track record of delivering successful projects in these areas and is dedicated to staying up to date with the latest advancements in the field.
Read more about Ee Kin Chin

Right arrow

Understanding Recurrent Neural Networks

A recurrent neural network (RNN) is a neural network that is made to process sequential data while being aware of the sequence of the data. Sequential data can involve time series based data and data that has a sequence but does not have a time component, such as text data. The applications of such a neural network are built upon the nature of the data itself. For time-series data, this can be either for nowcasting (predictions made for the current time with both past and present data) or forecasting targets. For text data, applications such as speech recognition and machine translation can utilize these neural networks.

Research in recurrent neural networks has slowed in the past few years with the advent of neural networks that can capture sequential data while removing recursive connections completely and achieving better performance, such as transformers. However, RNNs are still used extensively in the real world today to serve as a good...

Technical requirements

This chapter is short and sweet but still covers some practical implementations in the Python programming language to realize the RNN architecture. To complete it, you will need to have a computer with the Pytorch library installed.

You can find the code files for this chapter on GitHub at https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_4.

Understanding LSTM

LSTM was invented in 1997 but remains a widely adopted neural network. LSTM uses the tanh activation function as it provides nonlinearities while providing second derivatives that can be preserved for a longer sequence. The tanh function helps to prevent exploding and vanishing gradients. An LSTM layer uses a sequence of LSTM cells sequentially connected. Let’s take an in-depth look at what the LSTM cell looks like in Figure 4.1.

Figure 4.1 – A visual deep dive into an LSTM cell among a sequence of LSTM cells that forms an LSTM layer

Figure 4.1 – A visual deep dive into an LSTM cell among a sequence of LSTM cells that forms an LSTM layer

The first LSTM cell on the left depicts the high-level structure of an LSTM cell and the second LSTM cell on the left depicts the medium-level operations, connections, and structure of an LSTM cell, while the third cell on the right is just another LSTM cell to emphasize that LSTM layers are made of multiple LSTM cells sequentially connected to each other. Think of an LSTM cell as containing...

Understanding GRU

Gated recurrent units (GRU) was invented in 2014 and based on the ideas implemented in LSTM. GRU was made to simplify LSTM and provide a faster and more efficient way of achieving the same goals as LSTM to adaptively remember and forget based on past and present data. In terms of the learning capacity and metric performance achievable, there isn’t a clear silver-bullet winner among the two and often in the industry, the two RNN units are benchmarked against each other to figure out which method provides a better performance level. Figure 4.4 shows the structure of GRU.

Figure 4.4 – A low-level depiction of GRU

Figure 4.4 – A low-level depiction of GRU

Figure 4.4 adopts the same weights and bias notations as the LSTM depicted in Figure 4.2. There are three different names here for the final small letter notation. R being the reset gate, z representing the update gate, and h representing weights used to obtain the next hidden states. This means a GRU cell has fewer...

Understanding advancements over the standard GRU and LSTM layers

GRU and LSTM are the most widely used RNN methods today, but one might wonder how to push the boundaries achievable by a standard GRU or a standard LSTM. One good start to building this intuition is to understand that both of the layer types are capable of accepting sequential data, and to build a network you need multiple RNN layers. This means that it is entirely possible to combine GRU and LSTM layers in the same network. This, however, is not credible enough to be considered an advancement as a fully LSTM network or a fully GRU network can exceed the performance of a combined LSTM and GRU network at any time. Let’s dive into another simple improvement you can make on top of these standard RNN layers, called bidirectional RNN.

Decoding bidirectional RNN

Both GRU and LSTM rely on the sequential nature of the data. This order of the sequence can be forward in increasing time steps and also can be backward...

Summary

Recurrent neural networks are a type of neural network that explicitly includes inductive biases of sequential data in its structure.

A couple of variations of RNNs exist but all of them maintain the same high-level concept for their overall structure. Mainly, they provide varying ways to decide which data to learn from and remember along with which data to forget from the memory from the remembering stage.

However, do note that a more recent architecture called transformers, which will be introduced in Chapter 6, Understanding Neural Network Transformers, demonstrated that recurrence is not needed to achieve a good performance on sequential data.

With that, we are done with RNNs and will dive briefly into the world of autoencoders in the next chapter.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Deep Learning Architect's Handbook
Published in: Dec 2023Publisher: PacktISBN-13: 9781803243795
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ee Kin Chin

Ee Kin Chin is a Senior Deep Learning Engineer at DataRobot. He holds a Bachelor of Engineering (Honours) in Electronics with a major in Telecommunications. Ee Kin is an expert in the field of Deep Learning, Data Science, Machine Learning, Artificial Intelligence, Supervised Learning, Unsupervised Learning, Python, Keras, Pytorch, and related technologies. He has a proven track record of delivering successful projects in these areas and is dedicated to staying up to date with the latest advancements in the field.
Read more about Ee Kin Chin