Reader small image

You're reading from  Codeless Deep Learning with KNIME

Product typeBook
Published inNov 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800566613
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Kathrin Melcher
Kathrin Melcher
author image
Kathrin Melcher

Kathrin Melcher is a data scientist at KNIME. She holds a master's degree in mathematics from the University of Konstanz, Germany. She joined the evangelism team at KNIME in 2017 and has a strong interest in data science and machine learning algorithms. She enjoys teaching and sharing her data science knowledge with the community, for example, in the book From Excel to KNIME, as well as on various blog posts and at training courses, workshops, and conference presentations.
Read more about Kathrin Melcher

Rosaria Silipo
Rosaria Silipo
author image
Rosaria Silipo

Rosaria Silipo, Ph.D., now head of data science evangelism at KNIME, has spent 25+ years in applied AI, predictive analytics, and machine learning at Siemens, Viseca, Nuance Communications, and private consulting. Sharing her practical experience in a broad range of industries and deployments, including IoT, customer intelligence, financial services, social media, and cybersecurity, Rosaria has authored 50+ technical publications, including her recent books Guide to Intelligent Data Science (Springer) and Codeless Deep Learning with KNIME (Packt).
Read more about Rosaria Silipo

View More author details
Right arrow

Chapter 8: Neural Machine Translation

In the previous chapter, Chapter 7, Implementing NLP Applications, we introduced several text encoding techniques and used them in three Natural Language Processing (NLP) applications. One of the applications was for free text generation. The result showed that it is possible for a network to learn the structure of a language, so as to generate text in a certain style.

In this chapter, we will build on top of this case study for free text generation and train a neural network to automatically translate sentences from a source language into a target language. To do that, we will use concepts learned from the free text generation network, as well as from the autoencoder introduced in Chapter 5, Autoencoder for Fraud Detection.

We will start by describing the general concept of machine translation, followed by an introduction to the encoder-decoder neural architectures that will be used for neural machine translation. Next, we will discuss all...

Idea of Neural Machine Translation

Automatic translation has been a popular and challenging task for a long time now. The flexibility and ambiguity of the human language make it still one of the most difficult tasks to implement. The same word or phrase can have different meanings depending on the context and, often, there might not be just one correct translation, but many possible ways to translate the same sentence. So, how can a computer learn to translate text from one language into another? Different approaches have been introduced over the years, all with the same goal: to automatically translate sentences or text from a source language into a target language.

The development of automatic translation systems started in the early 1970s with Rule-Based Machine Translation (RBMT). Here, automatic translation was implemented through hand-developed rules and dictionaries by specialized linguists at the lexical, syntactic, and semantic levels of sentences.

In the 1990s, statistical...

Encoder-Decoder Architecture

In this section, we will first introduce the general concept of an encoder-decoder architecture. Afterward, we will focus on how the encoder is used in neural machine translation. In the last two subsections, we will concentrate on how the decoder is applied during training and deployment.

One of the possible structures for neural machine translation is the encoder-decoder network. In Chapter 5, Autoencoder for Fraud Detection, we introduced the concept of a neural network consisting of an encoder and a decoder component. Remember, in the case of an autoencoder, the task of the encoder component is to extract a dense representation of the input, while the task of the decoder component is to recreate the input based on the dense representation given by the encoder.

In the case of encoder-decoder networks for neural machine translation, the task of the encoder is to extract the context of the sentence in the source language (the input sentence) into...

Preparing the Data for the Two Languages

In Chapter 7, Implementing NLP Applications, we talked about the advantages and disadvantages of training neural networks at the character and word levels. As we already have some experience with the character level, we decided to also train this network for automatic translation at the character level.

To train a neural machine translation network, we need a dataset with bilingual sentence pairs for the two languages. Datasets for different language combinations can be downloaded for free at www.manythings.org/anki/. From there, we can download a dataset containing a number of sentences in English and German that are commonly used in everyday life. The dataset consists of two columns only: the original short text in English and the corresponding translation in German.

Figure 8.5 shows you a subset of this dataset to be used as the training set:

Figure 8.5 – Subset of the training set with English and German sentences

Figure 8.5 – Subset of the training set with English and...

Building and Training the Encoder-Decoder Architecture

Now that the three sequences are available, we can start defining the network structure within a workflow. In this section, you will learn how to define and train an encoder-decoder structure in KNIME Analytics Platform. Once the network is trained, you will learn how the encoder and decoder can be extracted into two networks. In the last section, we will discuss how the extracted networks can be used in a deployment workflow to translate English sentences into German.

Defining the Network Structure

In the encoder-decoder architecture, we want to have both the encoder and the decoder as LSTM networks. The encoder and the decoder have different input sequences. The English one-hot-encoded sentences are the input for the encoder and the German one-hot-encoded sentences are the input for the decoder. This means two input layers are needed: one for the encoder and one for the decoder.

The encoder network is made up of two...

Summary

In this chapter, we explored the topic of neural machine translation and trained a network to produce English-to-German translations.

We started with an introduction to automatic machine translation, covering its history from rule-based machine translation to neural machine translation. Next, we introduced the concept of encoder-decoder RNN-based architectures, which can be used for neural machine translation. In general, encoder-decoder architectures can be used for sequence-to-sequence prediction tasks or question-answer systems.

After that, we covered all the steps needed to train and apply a neural machine translation model at the character level, using a simple network structure with only one LSTM unit for both the encoder and decoder. The joint network, derived from the combination of the encoder and decoder, was trained using a teacher forcing paradigm.

At the end of the training phase and before deployment, a lambda layer was inserted in the decoder part to...

Questions and Exercises

  1. An encoder-decoder model is a:

    a.) Many-to-one architecture

    b.) Many-to-many architecture

    c.) One-to-many architecture

    d.) CNN architecture

  2. What is the task of the encoder in neural machine translation?

    a.) To encode the characters

    b.) To generate the translation

    c.) To extract a dense representation of the content in the target language

    d.) To extract a dense representation of the content in the source language

  3. What is another application for encoder-decoder LSTM networks?

    a.) Text classification

    b.) Question-answer systems

    c.) Language detection

    d.) Anomaly detection

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Codeless Deep Learning with KNIME
Published in: Nov 2020Publisher: PacktISBN-13: 9781800566613
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Kathrin Melcher

Kathrin Melcher is a data scientist at KNIME. She holds a master's degree in mathematics from the University of Konstanz, Germany. She joined the evangelism team at KNIME in 2017 and has a strong interest in data science and machine learning algorithms. She enjoys teaching and sharing her data science knowledge with the community, for example, in the book From Excel to KNIME, as well as on various blog posts and at training courses, workshops, and conference presentations.
Read more about Kathrin Melcher

author image
Rosaria Silipo

Rosaria Silipo, Ph.D., now head of data science evangelism at KNIME, has spent 25+ years in applied AI, predictive analytics, and machine learning at Siemens, Viseca, Nuance Communications, and private consulting. Sharing her practical experience in a broad range of industries and deployments, including IoT, customer intelligence, financial services, social media, and cybersecurity, Rosaria has authored 50+ technical publications, including her recent books Guide to Intelligent Data Science (Springer) and Codeless Deep Learning with KNIME (Packt).
Read more about Rosaria Silipo