Packt+ | Advance your knowledge in tech

You're reading from Practical Convolutional Neural Networks

Product type Book

Published in Feb 2018

Publisher Packt

ISBN-13 9781788392303

Pages 218 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Authors (3):

Mohit Sewak

Md. Rezaul Karim

Pradeep Pujari

View More author details

Table of Contents (11) Chapters

Preface

Deep Neural Networks – Overview

Introduction to Convolutional Neural Networks

Build Your First CNN and Performance Optimization

Popular CNN Model Architectures

Transfer Learning

Autoencoders for CNN

Object Detection and Instance Segmentation with CNN

GAN: Generating New Images with CNN

Attention Mechanism for CNN and Visual Models

Other Books You May Enjoy

Leave a review - let other readers know what you think

References

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, CoRR, arXiv:1502.03044, 2015.
Karl Moritz Hermann, Tom's Kocisk, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom, Teaching Machines to Read and Comprehend, CoRR, arXiv:1506.03340, 2015.
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, CoRR, arXiv:1406.6247, 2014.
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Tat-Seng Chua, SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, CoRR, arXiv:1611.05594, 2016.
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, CoRR, arXiv:1511.05960, 2015.
Wenpeng Yin, Sebastian Ebert, Hinrich Schutze, Attention-Based...

Summary

The attention mechanism is the hottest topic in deep learning today and is conceived to be in the center of most of the cutting-edge algorithms under current research, and in probable future applications. Problems such as image captioning, visual question answering, and many more have gotten great solutions by using this approach. In fact, attention is not limited to visual tasks and was conceived earlier for problems such as neural machine translations and other sophisticated NLP problems. Thus, understanding the attention mechanism is vital to mastering many advanced deep learning techniques.

CNNs are used not only for vision but also for many good applications with attention for solving complex NLP problems, such as modeling sentence pairs and machine translation. This chapter covered the attention mechanism and its application to some NLP problems, along with image captioning and recurrent vision models. In RAMs, we did not use CNN; instead, we applied RNN and attention to reduced...

Types of Attention

There are two types attention mechanisms. They are as follows:

Hard attention
Soft attention

Let's now take a look at each one in detail in the following sections.

Hard Attention

In reality, in our recent image caption example, several more pictures would be selected, but due to our training with the handwritten captions, those would never be weighted higher. However, the essential thing to understand is how the system would understand what all pixels (or more precisely, the CNN representations of them) the system focuses on to draw these high-resolution images of different aspects and then how to choose the next pixel to repeat the process.

In the preceding example, the points are chosen at random from a distribution and the process is repeated. Also, which pixels around this point get a higher resolution is decided inside the attention network. This type of attention is known as hard attention.

Hard attention has something called the differentiability problem. Let's spend some...

Using attention to improve visual models

As we discovered in the NLP example covered in the earlier section on Attention Mechanism - Intuition, Attention did help us a lot in both achieving new use-cases, not optimally feasible with conventional NLP, and vastly improving the performance of the existing NLP mechanism. Similar is the usage of Attention in CNN and Visual Models as well

In the earlier chapter Chapter 7, Object-Detection & Instance-Segmentation with CNN, we discovered how Attention (like) mechanism are used as Region Proposal Networks for networks like Faster R-CNN and Mask R-CNN, to greatly enhance and optimize the proposed regions, and enable the generation of segment masks. This corresponds to the first part of the discussion. In this section, we will cover the second part of the discussion, where we will use 'Attention' mechanism to improve the performance of our CNNs, even under extreme conditions.

Reasons for sub-optimal performance of visual CNN models

The performance...

References

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, CoRR, arXiv:1502.03044, 2015.
Karl Moritz Hermann, Tom's Kocisk, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom, Teaching Machines to Read and Comprehend, CoRR, arXiv:1506.03340, 2015.
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, CoRR, arXiv:1406.6247, 2014.
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Tat-Seng Chua, SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, CoRR, arXiv:1611.05594, 2016.
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, CoRR, arXiv:1511.05960, 2015.
Wenpeng Yin, Sebastian Ebert, Hinrich Schutze, Attention-Based...