You're reading from Deep Learning for Beginners

Product type Book

Published in Sep 2020

Publisher Packt

ISBN-13 9781838640859

Pages 432 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Dr. Pablo Rivas

Convolutional Neural Networks

This chapter introduces convolutional neural networks, starting with the convolution operation and moving forward to ensemble layers of convolutional operations, with the aim of learning about filters that operate over datasets. The pooling strategy is then introduced to show how such changes can improve the training and performance of a model. The chapter concludes by showing how to visualize the filters learned.

By the end of this chapter, you will be familiar with the motivation behind convolutional neural networks and will know how the convolution operation works in one and two dimensions. When you finish this chapter, you will know how to implement convolution in layers so as to learn filters through gradient descent. Finally, you will have a chance to use many tools that you learned previously, including dropout and batch normalization, but...

Introduction to convolutional neural networks

Previously, in Chapter 11, Deep and Wide Neural Networks, we used a dataset that was very challenging for a general-purpose network. However, convolutional neural networks (CNNs) will prove to be more effective, as you will see. CNNs have been around since the late 80s (LeCun, Y., et al. (1989)). They have transformed the world of computer vision and audio processing (Li, Y. D., et al. (2016)). If you have some kind of AI-based object recognition capability in your smartphone, chances are it is using some kind of CNN architecture; for example:

The recognition of objects in images
The recognition of a digital fingerprint
The recognition of voice commands

CNNs are interesting because they have solved some of the most challenging problems in computer vision, including beating a human being at an image recognition problem called ImageNet (Krizhevsky, A., et al. (2012)). If you can think of the most complex object recognition tasks, CNNs should...

Convolution in n-dimensions

The name of CNNs comes from their signature operation: convolution. This operation is a mathematical operation that is very common in the signal processing area. Let's go ahead and discuss the convolution operation.

1-dimension

Let's start with the discrete-time convolution function in one dimension. Suppose that we have input data, , and some weights, , we can define the discrete-time convolution operation between the two as follows:

In this equation, the convolution operation is denoted by a * symbol. Without complicating things too much, we can say that is inverted, , and then shifted, . The resulting vector is , which can be interpreted as the filtered version of the input when the filter is applied.

If we define the two vectors as follows, and , then the convolution operation yields .

Figure 12.1 shows every single step involved in obtaining this result by inverting and shifting the filter and multiplying across the input data:

Figure...

Convolutional layers

Convolution has a number of properties that are very interesting in the field of deep learning:

It can successfully encode and decode spatial properties of the data.
It can be calculated relatively quickly with the latest developments.
It can be used to address several computer vision problems.
It can be combined with other types of layers for maximum performance.

Keras has wrapper functions for TensorFlow that involve the most popular dimensions, that is, one, two, and three dimensions: Conv1D, Conv2D, and Conv3D. In this chapter, we will continue to focus on two-dimensional convolutions, but be sure that if you have understood the concept, you can easily go ahead and use the others.

Conv2D

The two-dimensional convolution method has the following signature: tensorflow.keras.layers.Conv2D. The most common arguments used in a convolutional layer are the following:

filters refers to the number of filters to be learned in this particular layer and affects the dimension...

Pooling strategies

You will usually find pooling accompanying convolutional layers. Pooling is an idea that is intended to reduce the number of computations by reducing the dimensionality of the problem. We have a few pooling strategies available to us in Keras, but the most important and popular ones are the following two:

AveragePooling2D
MaxPooling2D

These also exist for other dimensions, such as 1D. However, in order to understand pooling, we can simply look at the example in the following diagram:

Figure 12.4 - Max pooling example in 2D

In the diagram, you can observe how max pooling would look at individual 2x2 squares moving two spaces at a time, which leads to a 2x2 result. The whole point of pooling is to find a smaller summary of the data in question. When it comes to neural networks, we often look at neurons that are excited the most, and so it makes sense to look at the maximum values as good representatives of larger portions of data. However, remember that you can also...

Convolutional neural network for CIFAR-10

We have reached the point where we can actually implement a fully functional CNN after looking at the individual pieces: understanding the convolution operation, understanding pooling, and understanding how to implement convolutional layers and pooling. Now we will be implementing the CNN architecture shown in Figure 12.3.

Implementation

We will be implementing the network in Figure 12.3 step by step, broken down into sub-sections.

Loading data

Let's load the CIFAR-10 dataset as follows:

from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import numpy as np

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
print('x_train shape:', x_train.shape)
print('x_test shape...

Summary

This intermediate chapter showed how to create CNNs. You learned about the convolution operation, which is the fundamental concept behind them. You also learned how to create convolutional layers and aggregated pooling strategies. You designed a network to learn filters to recognize objects based on CIFAR-10 and learned how to display the learned filters.

At this point, you should feel confident explaining the motivation behind convolutional neural networks rooted in computer vision and signal processing. You should feel comfortable coding the convolution operation in one and two dimensions using NumPy, SciPy, and Keras/TensorFlow. Furthermore, you should feel confident implementing convolution operations in layers and learning filters through gradient descent techniques. If you are asked to show what the network has learned, you should feel prepared to implement a simple visualization method to display the filters learned.

CNNs are great at encoding highly correlated spatial...

Questions and answers

What data summarization strategy discussed in this chapter can reduce the dimensionality of a convolutional model?

Pooling.

Does adding more convolutional layers make the network better?

Not always. It has been shown that more layers has a positive effect on networks, but there are certain occasions when there is no gain. You should determine the number of layers, filter sizes, and pooling experimentally.

What other applications are there for CNNs?

Audio processing and classification; image denoising; image super-resolution; text summarization and other text-processing and classification tasks; the encryption of data.

References

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
Li, Y. D., Hao, Z. B., and Lei, H. (2016). Survey of convolutional neural networks. Journal of Computer Applications, 36(9), 2508-2515.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Jain, A. K., and Farrokhnia, F. (1991). Unsupervised texture segmentation using Gabor filters. Pattern recognition, 24(12), 1167...