You're reading from Advanced Deep Learning with Python

Product typeBook

Published inDec 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781789956177

Edition1st Edition

Languages

Python

Tools

PyTorch TensorFlow

Concepts

Deep Learning

Author (1)

Ivan Vasilev

Sequence-to-Sequence Models and Attention

In Chapter 7, Understanding Recurrent Networks, we outlined several types of recurrent models, depending on the input-output combinations. One of them is indirect many-to-many or sequence-to-sequence (seq2seq), where an input sequence is transformed into another, different output sequence, not necessarily with the same length as the input. Machine translation is the most popular type of seq2seq task. The input sequences are the words of a sentence in one language and the output sequences are the words of the same sentence translated into another language. For example, we can translate the English sequence tourist attraction to the German touristenattraktion. Not only is the output sentence a different length, but there is no direct correspondence between the elements of the input and output sequences. In particular, one output element...

Introducing seq2seq models

Seq2seq, or encoder-decoder (see Sequence to Sequence Learning with Neural Networks at https://arxiv.org/abs/1409.3215), models use RNNs in a way that's especially suited for solving tasks with indirect many-to-many relationships between the input and the output. A similar model was also proposed in another pioneering paper, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (go to https://arxiv.org/abs/1406.1078 for more information). The following is a diagram of the seq2seq model. The input sequence [A, B, C, <EOS>] is decoded into the output sequence [W, X, Y, Z, <EOS>]:

A seq2seq model case by https://arxiv.org/abs/1409.3215

The model consists of two parts: an encoder and a decoder. Here's how the inference part works:

The encoder is an RNN. The original paper uses LSTM, but GRU...

Seq2seq with attention

The decoder has to generate the entire output sequence based solely on the thought vector. For this to work, the thought vector has to encode all of the information of the input sequence; however, the encoder is an RNN, and we can expect that its hidden state will carry more information about the latest sequence elements than the earliest. Using LSTM cells and reversing the input helps, but cannot prevent it entirely. Because of this, the thought vector becomes something of a bottleneck. As a result, the seq2seq model works well for short sentences, but the performance deteriorates for longer ones.

Bahdanau attention

We can solve this problem with the help of the attention mechanism (see Neural Machine...

Understanding transformers

We spent the better part of this chapter touting the advantages of the attention mechanism. But we still use attention in the context of RNNs—in that sense, it works as an addition on top of the core recurrent nature of these models. Since attention is so good, is there a way to use it on its own without the RNN part? It turns out that there is. The paper Attention is all you need (https://arxiv.org/abs/1706.03762) introduces a new architecture called transformer with encoder and decoder that relies solely on the attention mechanism. First, we'll focus our attention on the transformer attention (pun intended) mechanism.

The transformer attention

Before focusing on the entire model, let...

Transformer language models

In Chapter 6, Language Modeling, we introduced several different language models (word2vec, GloVe, and fastText) that use the context of a word (its surrounding words) to create word vectors (embeddings). These models share some common properties:

They are context-free (I know it contradicts the previous statement) because they create a single global word vector of each word based on all its occurrences in the training text. For example, lead can have completely different meanings in the phrases lead the way and lead atom, yet the model will try to embed both meanings in the same word vector.
They are position-free because they don't take into account the order of the contextual words when training for the embedding vectors.

In contrast, it's possible to create transformer-based language models, which are both context- and position-dependent...

Summary

In this chapter, we focused on seq2seq models and the attention mechanism. First, we discussed and implemented a regular recurrent encoder-decoder seq2seq model and learned how to complement it with the attention mechanism. Then, we talked about and implemented a purely attention-based type of model called a transformer. We also defined multihead attention in their context. Next, we discussed transformer language models (such as BERT, transformerXL, and XLNet). Finally, we implemented a simple text-generation example using the transformers library.

This chapter concludes our series of chapters with a focus on natural language processing. In the next chapter, we'll talk about some new trends in deep learning that aren't fully matured yet but hold great potential for the future.

The rest of the chapter is locked

You have been reading a chapter from

Advanced Deep Learning with Python

Published in: Dec 2019Publisher: PacktISBN-13: 9781789956177

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev

Other recommended products

Related to this chapter

Python Deep Learning

The book will help you learn deep neural networks and their applications in computer vision, generative models, and natural language processing. It will also introduce you to the area of reinforcement learning, where you’ll learn the state-of-the-art algorithms to teach the machines how to play games like Go and Atari.

BookJan 2019386 pages

Hands-On Meta Learning with Python

This hands-on guide for meta learning starts with exploring the principles, algorithms, and implementations of Meta learning with Tensorflow, Keras, and Python. Once it sets the foundation of "learning to learn", the book will help you implement your meta learning algorithms from scratch.

BookDec 2018226 pages

Hands-On One-shot Learning with Python

This book is a step by step guide to one-shot learning using Python-based libraries. It is designed to help you understand and design models that can learn information about your data from one, or only a few, training examples. You will also learn to apply these techniques with real-world examples and datasets for classification and regression.

BookApr 2020156 pages

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

BookJan 2021352 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Keras Deep Learning Cookbook

This book gives you a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning. It presents a unique problem-solution approach to tackle various problems in training different types of neural networks while taking care of the speed and accuracy of these models

BookOct 2018252 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Mastering Transformers

Explore the accurate and fast fine-tuning capabilities of transformer-based language models and understand how they outperform traditional machine learning-based approaches when solving challenging NLU problems. Developers working with the Transformers architecture will be able to put their knowledge to work with this practical guide.

BookSep 2021374 pages

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

BookJul 2020276 pages

Advanced Deep Learning with TensorFlow 2 and Keras

A second edition of the bestselling guide to exploring and mastering deep learning with Keras, updated to include TensorFlow 2.x with new chapters on object detection, semantic segmentation, and unsupervised learning using mutual information.

BookFeb 2020512 pages

Practical Convolutional Neural Networks

This book helps you master CNN, from the basics to the most advanced concepts in CNN such as GANs, instance classification and attention mechanism for vision models and more. You will implement advanced CNN models using complex image and video datasets. By the end of the book you will learn CNN’s best practices to implement smart ConvNet models and apply them to solve complex deep learning problems.

BookFeb 2018218 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages