Reader small image

You're reading from  Natural Language Processing with TensorFlow - Second Edition

Product typeBook
Published inJul 2022
Reading LevelIntermediate
PublisherPackt
ISBN-139781838641351
Edition2nd Edition
Languages
Right arrow
Author (1)
Thushan Ganegedara
Thushan Ganegedara
author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

Right arrow

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a special family of neural networks that are designed to cope with sequential data (that is, time-series data), such as stock market prices or a sequence of texts (for example, variable-length sentences). RNNs maintain a state variable that captures the various patterns present in sequential data; therefore, they are able to model sequential data. In comparison, conventional feed-forward neural networks do not have this ability unless the data is represented with a feature representation that captures the important patterns present in the sequence. However, coming up with such feature representations is extremely difficult. Another alternative for feed-forward models to model sequential data is to have a separate set of parameters for each position in time/sequence so that the set of parameters assigned to a certain position learns about the patterns that occur at that position. This will greatly increase the memory requirement...

Understanding RNNs

In this section, we will discuss what an RNN is by starting with a gentle introduction, and then move on to more in-depth technical details. We mentioned earlier that RNNs maintain a state variable that evolves over time as the RNN sees more data, thus giving it the power to model sequential data. In particular, this state variable is updated over time by a set of recurrent connections. The existence of recurrent connections is the main structural difference between an RNN and a feed-forward network. The recurrent connections can be understood as links between a series of memories that the RNN learned in the past, connecting to the current state variable of the RNN. In other words, the recurrent connections update the current state variable with respect to the past memory the RNN has, enabling the RNN to make a prediction based on the current input as well as the previous inputs.

The term RNN is sometimes used to refer to the family of recurrent models...

Backpropagation Through Time

For training RNNs, a special form of backpropagation, known as Backpropagation Through Time (BPTT), is used. To understand BPTT, however, first we need to understand how BP works. Then we will discuss why BP cannot be directly applied to RNNs, but how BP can be adapted for RNNs, resulting in BPTT. Finally, we will discuss two major problems present in BPTT.

How backpropagation works

Backpropagation is the technique that is used to train a feed-forward neural network. In backpropagation, you do the following:

  • Calculate a prediction for a given input
  • Calculate an error, E, of the prediction by comparing it to the actual label of the input (for example, mean squared error and cross-entropy loss)
  • Update the weights of the feed-forward network to minimize the loss calculated in step 2, by taking a small step in the opposite direction of the gradient for all wij, where wij is the jth weight of the ith layer

To understand...

Applications of RNNs

So far, we have only talked about one-to-one-mapped RNNs, where the current output depends on the current input as well as the previously observed history of inputs. This means that there exists an output for the sequence of previously observed inputs and the current input. However, in the real word, there can be situations where there is only one output for a sequence of inputs, a sequence of outputs for a single input, and a sequence of outputs for a sequence of inputs where the sequence sizes are different. In this section, we will look at several different settings of RNN models and the applications they would be used in.

One-to-one RNNs

In one-to-one RNNs, the current input depends on the previously observed inputs (see Figure 6.8). Such RNNs are appropriate for problems where each input has an output, but the output depends both on the current input and the history of inputs that led to the current input. An example of such a task is stock market...

Named Entity Recognition with RNNs

Now let’s look at our first task: using an RNN to identify named entities in a text corpus. This task is known as Named Entity Recognition (NER). We will be using a modified version of the well-known CoNLL 2003 (which stands for Conference on Computational Natural Language Learning - 2003) dataset for NER.

CoNLL 2003 is available for multiple languages, and the English data was generated from a Reuters Corpus that contains news stories published between August 1996 and August 1997. The database we’ll be using is found at https://github.com/ZihanWangKi/CrossWeigh and is called CoNLLPP. It is a more closely curated version than the original CoNLL, which contains errors in the dataset induced by incorrectly understanding the context of a word. For example, in the phrase “Chicago won …” Chicago was identified as a location, whereas it is in fact an organization. This exercise is available in ch06_rnns_for_named_entity_recognition...

NER with character and token embeddings

Nowadays, recurrent models used to solve the NER task are much more sophisticated than having just a single embedding layer and an RNN model. They involve using more advanced recurrent models like Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), etc. We will set aside the discussion about these advanced models for several upcoming chapters. Here we will focus our discussion on a technique that provides the model embeddings at multiple scales, enabling it to understand language better. That is, instead of relying only on token embeddings, also use character embeddings. Then a token embedding is generated with the character embeddings by shifting a convolutional window over the characters in the token. Don’t worry if you don’t understand the details yet. The following sections will go into specific details of the solution. This exercise is available in ch06_rnns_for_named_entity_recognition.ipynb in the Ch06-Recurrent...

Summary

In this chapter, we looked at RNNs, which are different from conventional feed-forward neural networks and more powerful in terms of solving temporal tasks.

Specifically, we discussed how to arrive at an RNN from a feed-forward neural network type structure.

We assumed a sequence of inputs and outputs, and designed a computational graph that can represent the sequence of inputs and outputs.

This computational graph resulted in a series of copies of functions that we applied to each individual input-output tuple in the sequence. Then, by generalizing this model to any given single time step t in the sequence, we were able to arrive at the basic computational graph of an RNN. We discussed the exact equations and update rules used to calculate the hidden state and the output.

Next we discussed how RNNs are trained with data using BPTT. We examined how we can arrive at BPTT with standard backpropagation as well as why we can’t use standard backpropagation...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing with TensorFlow - Second Edition
Published in: Jul 2022Publisher: PacktISBN-13: 9781838641351
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara