Reader small image

You're reading from  Transformers for Natural Language Processing - Second Edition

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781803247335
Edition2nd Edition
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

Machine Translation with the Transformer

Humans master sequence transduction, transferring a representation to another object. We can easily imagine a mental representation of a sequence. If somebody says The flowers in my garden are beautiful, we can easily visualize a garden with flowers in it. We see images of the garden, although we might never have seen that garden. We might even imagine chirping birds and the scent of flowers.

A machine must learn transduction from scratch with numerical representations. Recurrent or convolutional approaches have produced interesting results but have not reached significant BLEU translation evaluation scores. Translating requires the representation of language A transposed into language B.

The transformer model’s self-attention innovation increases the analytic ability of machine intelligence. A sequence in language A is adequately represented before attempting to translate it into language B. Self-attention brings the level of...

Defining machine translation

Vaswani et al. (2017) tackled one of the most difficult NLP problems when designing the Transformer. The human baseline for machine translation seems out of reach for us human-machine intelligence designers. This did not stop Vaswani et al. (2017) from publishing the Transformer’s architecture and achieving state-of-the-art BLEU results.

In this section, we will define machine translation. Machine translation is the process of reproducing human translation by machine transductions and outputs:

Figure 6.1: Machine translation process

The general idea in Figure 6.1 is for the machine to do the following in a few steps:

  • Choose a sentence to translate
  • Learn how words relate to each other with hundreds of millions of parameters
  • Learn the many ways in which words refer to each other
  • Use machine transduction to transfer the learned parameters to new sequences
  • Choose a candidate translation for a word...

Preprocessing a WMT dataset

Vaswani et al. (2017) present the Transformer’s achievements on the WMT 2014 English-to-German translation task and the WMT 2014 English-to-French translation task. The Transformer achieves a state-of-the-art BLEU score. BLEU will be described in the Evaluating machine translation with BLEU section of this chapter.

The 2014 WMT contained several European language datasets. One of the datasets contained data taken from version 7 of the Europarl corpus. We will be using the French-English dataset from the European Parliament Proceedings Parallel Corpus, 1996-2011 (https://www.statmt.org/europarl/v7/fr-en.tgz).

Once you have downloaded the files and have extracted them, we will preprocess the two parallel files:

  • europarl-v7.fr-en.en
  • europarl-v7.fr-en.fr

We will load, clear, and reduce the size of the corpus.

Let’s start the preprocessing.

Preprocessing the raw data

In this section, we will preprocess...

Evaluating machine translation with BLEU

Papineni et al. (2002) came up with an efficient way to evaluate a human translation. The human baseline was difficult to define. However, they realized that we could obtain efficient results if we compared human translation with machine translation, word for word.

Papineni et al. (2002) named their method the Bilingual Evaluation Understudy Score (BLEU).

In this section, we will use the Natural Language Toolkit (NLTK) to implement BLEU:

http://www.nltk.org/api/nltk.translate.html#nltk.translate.bleu_score.sentence_bleu

We will begin with geometric evaluations.

Geometric evaluations

The BLEU method compares the parts of a candidate sentence to a reference sentence or several reference sentences.

Open BLEU.py, which is in the chapter directory of the GitHub repository of this book.

The program imports the nltk module:

from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import...

Translation with Google Translate

Google Translate, https://translate.google.com/, provides a ready-to-use interface for translations. Google is progressively introducing a transformer encoder into its translation algorithms. In the following section, we will implement a transformer model for a translation task with Google Trax.

However, an AI specialist may not be required at all.

If we enter the sentence analyzed in the previous section in Google Translate, Levez-vous svp pour cette minute de silence, we obtain an English translation in real time:

Figure 6.2: Google Translate

The translation is correct.

Does Industry 4.0 still require AI specialists for translation tasks or simply a web interface developer?

Google provides every service required for translations on their Google Translate platform: https://cloud.google.com/translate:

  • A translation API: A web developer can create an interface for a customer
  • A media translation API that...

Translations with Trax

Google Brain developed Tensor2Tensor (T2T) to make deep learning development easier. T2T is an extension of TensorFlow and contains a library of deep learning models that contains many transformer examploes.

Although T2T was a good start, Google Brain then produced Trax, an end-to-end deep learning library. Trax contains a transformer model that can be applied to translations. The Google Brain team presently maintains Trax.

This section will focus on the minimum functions to initialize the English-German problem described by Vaswani et al. (2017) to illustrate the Transformer’s performance.

We will be using preprocessed English and German datasets to show that the Transformer architecture is language-agnostic.

Open Trax_Translation.ipynb.

We will begin by installing the modules we need.

Installing Trax

Google Brain has made Trax easy to install and run. We will import the basics along with Trax, which can be installed in one...

Summary

In this chapter, we went through three additional essential aspects of the original Transformer.

We started by defining machine translation. Human translation sets an extremely high baseline for machines to reach. We saw that English-French and English-German translations imply numerous problems to solve. The transformer tackled these problems and set state-of-the-art BLEU records to beat.

We then preprocessed a WMT French-English dataset from the European Parliament that required cleaning. We had to transform the datasets into lines and clean the data up. Once that was done, we reduced the dataset’s size by suppressing words that occurred below a frequency threshold.

Machine translation NLP models require identical evaluation methods. Training a model on a WMT dataset requires BLEU evaluations. We saw that geometric assessments are a good basis for scoring translations, but even modified BLEU has its limits. We thus added a smoothing technique to enhance...

Questions

  1. Machine translation has now exceeded human baselines. (True/False)
  2. Machine translation requires large datasets. (True/False)
  3. There is no need to compare transformer models using the same datasets. (True/False)
  4. BLEU is the French word for blue and is the acronym of an NLP metric (True/False)
  5. Smoothing techniques enhance BERT. (True/False)
  6. German-English is the same as English-German for machine translation. (True/False)
  7. The original Transformer multi-head attention sub-layer has 2 heads. (True/False)
  8. The original Transformer encoder has 6 layers. (True/False)
  9. The original Transformer encoder has 6 layers but only 2 decoder layers. (True/False)
  10. You can train transformers without decoders. (True/False)

References

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing - Second Edition
Published in: Mar 2022Publisher: PacktISBN-13: 9781803247335
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman