Reader small image

You're reading from  Hands-On Natural Language Processing with PyTorch 1.x

Product typeBook
Published inJul 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781789802740
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Thomas Dop
Thomas Dop
author image
Thomas Dop

Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.
Read more about Thomas Dop

Right arrow

Chapter 7: Text Translation Using Sequence-to-Sequence Neural Networks

In the previous two chapters, we used neural networks to classify text and perform sentiment analysis. Both tasks involve taking an NLP input and predicting some value. In the case of our sentiment analysis, this was a number between 0 and 1 representing the sentiment of our sentence. In the case of our sentence classification model, our output was a multi-class prediction, of which there were several categories our sentence belonged to. But what if we wish to make not just a single prediction, but predict a whole sentence? In this chapter, we will build a sequence-to-sequence model that takes a sentence in one language as input and outputs the translation of this sentence in another language.

We have already explored several types of neural network architecture used for NLP learning, namely recurrent neural networks in Chapter 5, Recurrent Neural Networks and Sentiment Analysis, and convolutional neural networks...

Technical requirements

Theory of sequence-to-sequence models

Sequence-to-sequence models are very similar to the conventional neural network structures we have seen so far. The main difference is that for a model's output, we expect another sequence, rather than a binary or multi-class prediction. This is particularly useful in tasks such as translation, where we may wish to convert a whole sentence into another language.

In the following example, we can see that our English-to-Spanish translation maps word to word:

Figure 7.1 – English to Spanish translation

Figure 7.1 – English to Spanish translation

The first word in our input sentence maps nicely to the first word in our output sentence. If this were the case for all languages, we could simply pass each word in our sentence one by one through our trained model to get an output sentence, and there would be no need for any sequence-to-sequence modeling, as shown here:

Figure 7.2 – English-to-Spanish translation of words

...

Building a sequence-to-sequence model for text translation

In order to build our sequence-to-sequence model for translation, we will implement the encoder/decoder framework we outlined previously. This will show how the two halves of our model can be utilized together in order to capture a representation of our data using the encoder and then translate this representation into another language using our decoder. In order to do this, we need to obtain our data.

Preparing the data

By now, we know enough about machine learning to know that for a task like this, we will need a set of training data with corresponding labels. In this case, we will need sentences in one language with the corresponding translations in another language. Fortunately, the Torchtext library that we used in the previous chapter contains a dataset that will allow us to get this.

The Multi30k dataset in Torchtext consists of approximately 30,000 sentences with corresponding translations in multiple languages...

Next steps

While we have shown our sequence-to-sequence model to be effective at performing language translation, the model we trained from scratch is not a perfect translator by any means. This is, in part, due to the relatively small size of our training data. We trained our model on a set of 30,000 English/German sentences. While this might seem very large, in order to train a perfect model, we would require a training set that's several orders of magnitude larger.

In theory, we would require several examples of each word in the entire English and German languages for our model to truly understand its context and meaning. For context, the 30,000 English sentences in our training set consisted of just 6,000 unique words. The average vocabulary of an English speaker is said to be between 20,000 and 30,000 words, which gives us an idea of just how many examples sentences we would need to train a model that performs perfectly. This is probably why the most accurate translation...

Summary

In this chapter, we covered how to build sequence-to-sequence models from scratch. We learned how to code up our encoder and decoder components individually and how to integrate them into a single model that is able to translate sentences from one language into another.

Although our sequence-to-sequence model, which consists of an encoder and a decoder, is useful for sequence translation, it is no longer state-of-the-art. In the last few years, combining sequence-to-sequence models with attention models has been done to achieve state-of-the-art performance.

In the next chapter, we will discuss how attention networks can be used in the context of sequence-to-sequence learning and show how we can use both techniques to build a chat bot.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Natural Language Processing with PyTorch 1.x
Published in: Jul 2020Publisher: PacktISBN-13: 9781789802740
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Thomas Dop

Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.
Read more about Thomas Dop