Convolutional Neural Networks (CNNs) are the most popular and widely used deep neural networks for computer vision problems. They are used in a variety of applications including image classification, face recognition, document analysis, medical image analysis, action recognition, and natural language processing. In this chapter, we will focus on learning convolutional operations, and concepts such as padding and strides, to optimize CNNs. The idea behind this chapter is to make you well versed with the functioning of the CNN and learn techniques such as data augmentation and batch normalization to fine-tune your network and prevent overfitting. We will also provide a brief discussion about how we can leverage transfer learning to boost model performance.
In this chapter, we will cover the following recipes:
The generic architecture of CNN is comprised of convolutional layers followed by fully connected layers. Like other neural networks, a CNN also contains input, hidden and output layers, but it works by restructuring the data into tensors that consist of the image, and the width and height of the image. In CNN, each volume in one layer is connected only to a spatially relevant region in the next layer to ensure that when the number of layers increases, each neuron has a local influence on its specific location. A CNN may also contain pooling layers along with few fully connected layers.
The following is an example of a simple CNN with convolution and pooling layers. In this recipe, we will work with convolution layers. We will introduce the concept of pooling layers in the Getting familiar with pooling layers recipe of...
In this recipe, we will learn about two key configuration hyperparameters of CNN, which are strides and padding. Strides are used mainly to reduce the size of the output volume. Padding is another technique that lets us preserve the dimensions of the input volume in the output volume, thus enabling us to extract the low-level features efficiently.
Strides: Stride, in very simple terms, means the step of the convolution operation. Stride specifies the amount by which filters convolve around the input. For example, if we specify the value of stride argument as 1, that means the filter will shift one unit at a time over the input matrix.
Strides can be used for multiple purposes, primarily the following:
- To avoid feature overlapping
- To achieve smaller spatial dimensionality of the output volume
In the following diagram, you...
CNNs use pooling layers to reduce the size of the representation, to speed up the computation of the network, and to ensure robust feature extraction. The pooling layer is mostly stacked on top of the convolutional layer and this layer heavily downsizes the input dimension to reduce the computation in the network and also reduce overfitting.
There are two most commonly used types of pooling techniques :
- Max pooling: This type of pooling does downsampling by dividing the input matrix into pooling regions followed by computing the max values of each region.
Here's an example:
- Average pooling: This type of pooling does downsampling by dividing the input matrix into pooling regions followed by computing the average values of each region.
Here's an example:
In this recipe, we will learn how...
Transfer learning helps us solve a new problem using fewer examples by using information gained from solving other related tasks. It is a technique where we reuse a learned model trained on a different dataset to solve a similar but different problem. In transfer learning, we extend the learning of a pre-trained model in our network and build a new model to solve a new learning problem. The keras library in R provides many pre-trained models; we will be using one such model called as VGG16 to train our network.
We will start by importing the keras library into our environment:
In this example, we will work with a subset of the Dogs versus Cats dataset from...