Reader small image

You're reading from  Hands-On Mathematics for Deep Learning

Product typeBook
Published inJun 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838647292
Edition1st Edition
Languages
Right arrow
Author (1)
Jay Dawani
Jay Dawani
author image
Jay Dawani

Jay Dawani is a former professional swimmer turned mathematician and computer scientist. He is also a Forbes 30 Under 30 Fellow. At present, he is the Director of Artificial Intelligence at Geometric Energy Corporation (NATO CAGE) and the CEO of Lemurian Labs - a startup he founded that is developing the next generation of autonomy, intelligent process automation, and driver intelligence. Previously he has also been the technology and R&D advisor to Spacebit Capital. He has spent the last three years researching at the frontiers of AI with a focus on reinforcement learning, open-ended learning, deep learning, quantum machine learning, human-machine interaction, multi-agent and complex systems, and artificial general intelligence.
Read more about Jay Dawani

Right arrow

Attention Mechanisms

In the preceding two chapters, we learned about convolutional neural networks and recurrent neural networks, both of which have been very effective for sequential tasks such as machine translation, image captioning, object recognition, and so on. But we have also seen that they have limitations. RNNs have problems with long-term dependencies. In this chapter, we will cover attention mechanisms, which have been increasing in popularity and have shown incredible results in language- and vision-related tasks.

The following topics will be covered in this chapter:

  • Overview of attention
  • Understanding neural Turing machines
  • Exploring the types of attention
  • Transformers

Let's get started!

Overview of attention

When we go about our lives (in the real world), our brains don't observe every detail in our environment at all times; instead, we focus on (or pay greater attention to) information that is relevant to the task at hand. For example, when we are driving, we are able to adjust our focal length to focus on different details, some of which are closer and others are further away, and then act on what we observe. Similarly, when we are conversing with others, we usually don't listen carefully to each and every word; we listen to only part of what is spoken and use it to infer the relationships with some of the words to figure out what the other person is saying. Often, when we are reading/listening to someone, we can use some words to infer what the person is going to say next based on what we have already read/heard.

But why do we need these attention...

Understanding neural Turing machines

The Turing machine (TM) was proposed by Alan Turing in 1936, and it is a mathematical model of computation made up of an infinitely long tape and a head that interacts with the tape by reading, editing, and moving symbols on it. It works by manipulating symbols on the strip according to a predefined set of rules. The tape is made up of an endless number of cells, each of which can contain one of three symbols 0, 1, or blank (" "). Therefore, this is referred to as a three-symbol Turing machine. Regardless of how simple it seems, it is capable of simulating any computer algorithm, regardless of complexity. The tape that these computations are done on can be considered to be the machine's memory, akin to how our modern-day computers have memory. However, the Turing machine differs from modern-day computers as it has limited...

Exploring the types of attention

Attention has proven to be so effective in machine translation that it has been expanded into natural language processing, statistical learning, speech understanding, object detection and recognition, image captioning, and visual question answering.

The purpose of attention is to estimate how correlated (connected) two or more elements are to one another.

However, there isn't just one kind of attention. There are many types, such as the following:

  • Self-attention: Captures the relationship between different positions of a sequence of inputs
  • Global or soft attention: Focuses on the entire sequence of inputs
  • Local or hard attention: Focuses on only part of the sequence of inputs

Let's take a look at these in more detail.

Self-attention

...

Transformers

For those of you who got excited at the title (transformers), this section sadly has nothing to do with Optimus Prime or Bumblebee. In all seriousness now, we have seen that attention mechanisms work well with architectures such as RNNs and CNNs, but they are also powerful enough to be used on their own, as evidenced by Vaswani in 2017, in his paper Attention Is All you Need.

The transformer model is made entirely out of self-attention mechanisms to perform sequence-to-sequence tasks without the need for any form of recurrent unit. Wait, but how? Let's break down the architecture and find out how this is possible.

RNNs take in the encoded input and then decode it in order to map it to a target output. However, the transformer differs here by instead treating the encoding as a set of key-value pairs (K, V) which has dimensions (=n) equal to the length of (the...

Summary

In this chapter, we learned about a hot new area in deep learning known as attention mechanisms. These are used to allow networks to focus on specific parts of input. This helps the network overcome the problem of long-term dependencies. We also learned about how these attention mechanisms can be used instead of sequential models such as RNNs to produce state-of-the-art results on tasks such as machine translation and sentence generation. However, they can also be used to focus on relevant parts of images. This can be used for tasks such as visual question answering, where we may want our network to tell us what is happening in a given scene.

In the next chapter, we will learn about generative models.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Mathematics for Deep Learning
Published in: Jun 2020Publisher: PacktISBN-13: 9781838647292
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Jay Dawani

Jay Dawani is a former professional swimmer turned mathematician and computer scientist. He is also a Forbes 30 Under 30 Fellow. At present, he is the Director of Artificial Intelligence at Geometric Energy Corporation (NATO CAGE) and the CEO of Lemurian Labs - a startup he founded that is developing the next generation of autonomy, intelligent process automation, and driver intelligence. Previously he has also been the technology and R&D advisor to Spacebit Capital. He has spent the last three years researching at the frontiers of AI with a focus on reinforcement learning, open-ended learning, deep learning, quantum machine learning, human-machine interaction, multi-agent and complex systems, and artificial general intelligence.
Read more about Jay Dawani