Reader small image

You're reading from  Hands-On Mathematics for Deep Learning

Product typeBook
Published inJun 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838647292
Edition1st Edition
Languages
Right arrow
Author (1)
Jay Dawani
Jay Dawani
author image
Jay Dawani

Jay Dawani is a former professional swimmer turned mathematician and computer scientist. He is also a Forbes 30 Under 30 Fellow. At present, he is the Director of Artificial Intelligence at Geometric Energy Corporation (NATO CAGE) and the CEO of Lemurian Labs - a startup he founded that is developing the next generation of autonomy, intelligent process automation, and driver intelligence. Previously he has also been the technology and R&D advisor to Spacebit Capital. He has spent the last three years researching at the frontiers of AI with a focus on reinforcement learning, open-ended learning, deep learning, quantum machine learning, human-machine interaction, multi-agent and complex systems, and artificial general intelligence.
Read more about Jay Dawani

Right arrow

Convolutional Neural Networks

In this chapter, we will cover one of the most popular and widely used deep neural networks—the convolutional neural network (CNN, also known as ConvNet).

It is this class of neural networks that is largely responsible for the incredible feats that have been accomplished in computer vision over the last few years, starting with AlexNet, created by Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever, which outperformed all the other models in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), thus beginning the deep learning revolution.

ConvNets are a very powerful type of neural network for processing data. They have a grid-like topology (that is, there is a spatial correlation between neighboring points) and are tremendously useful in a variety of applications, such as facial recognition, self-driving cars, surveillance...

The inspiration behind ConvNets

CNNs are a type of artificial neural network (ANN); they are loosely inspired by the concept that the human visual cortex processes images and allows our brains to recognize objects in the world and interact with them, which allows us to do a number of things, such as drive, play sports, read, watch movies, and so on.

It has been found that computations that somewhat resemble convolutions take place in our brains. Additionally, our brains possess both simple and complex cells. The simple cells pick up basic features, such as edges and curves, while the complex cells show spatial invariance, while also responding to the same cues as the simple cells.

Types of data used in ConvNets

CNNs work exceptionally well on visual tasks, such as object classification and object recognition in images and videos and pattern recognition in music, sound clips, and so on. They work effectively in these areas because they are able to exploit the structure of the data to learn about it. This means that we cannot alter the properties of the data. For example, images have a fixed structure and if we were to alter this, the image would no longer make sense. This differs from ANNs, where the ordering of feature vectors does not matter. Therefore, the data for CNNs is stored in multidimensional arrays.

In computers, images are in grayscale (black and white) or are colored (RGB), and videos (RGB-D) are made of up pixels. A pixel is the smallest unit of a digitized image that can be shown on a computer and holds values in the form of [0, 255]. The...

Convolutions and pooling

In Chapter 7, Feedforward Neural Networks, we saw how deep neural networks are built and how weights connect neurons in one layer to neurons in the previous or following layer. The layers in CNNs, however, are connected through a linear operation known as convolution, which is where their name comes from and what makes it such a powerful architecture for images.

Here, we will go over the various kinds of convolution and pooling operations used in practice and what the effect of each is. But first, let's see what convolution actually is.

Two-dimensional convolutions

In mathematics, we write convolutions as follows:

What this means is that we have a function, f, which is our input and a function...

Working with the ConvNet architecture

Now that we know all the different components that make up a ConvNet, we can put it all together and see how to construct a deep CNN. In this section, we will build a full architecture and observe how forward propagation works and how we decide the depth of the network, the number of kernels to apply, when and why to use pooling, and so on. But before we dive in, let's explore some of the ways in which CNNs differ from FNNs. They are as follows:

  • The neurons in CNNs have local connectivity, which means that each neuron in a successive layer receives input from a small local group of pixels from an image, instead of receiving the entire image, as a feedforward neural network (FNN) would.
  • Each neuron in the layer of a CNN has the same weight parameters.
  • The layers in CNNs can be normalized.
  • CNNs are translation invariant, which allows us...

Training and optimization

Now that we've got that sorted, it's time for us to dive into the really fun stuff. How do we train these fantastic architectures? Do we need a completely new algorithm to facilitate our training and optimization? No! We can still use backpropagation and gradient descent to calculate the error, differentiate it with respect to the previous layers, and update the weights to get us as close to the global optima as possible.

But before we go further, let's go through how backpropagation works in CNNs, particularly with kernels. Let's revisit the example we used earlier on in this chapter, where we convolved a 3 × 3 input with a 2 × 2 kernel, which looked as follows:

We expressed each element in the output matrix as follows:

We should remember from Chapter 7, Feedforward Networks, where we introduced backpropagation, that we...

Summary

Congratulations! We have just finished learning about a powerful variant of neural networks known as CNNs, which are very effective in tasks relating to computer vision and time-series prediction. We will revisit CNNs later on in this book, but in the meantime, let's move on to the next chapter and learn about recurrent and recursive neural networks.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Mathematics for Deep Learning
Published in: Jun 2020Publisher: PacktISBN-13: 9781838647292
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Jay Dawani

Jay Dawani is a former professional swimmer turned mathematician and computer scientist. He is also a Forbes 30 Under 30 Fellow. At present, he is the Director of Artificial Intelligence at Geometric Energy Corporation (NATO CAGE) and the CEO of Lemurian Labs - a startup he founded that is developing the next generation of autonomy, intelligent process automation, and driver intelligence. Previously he has also been the technology and R&D advisor to Spacebit Capital. He has spent the last three years researching at the frontiers of AI with a focus on reinforcement learning, open-ended learning, deep learning, quantum machine learning, human-machine interaction, multi-agent and complex systems, and artificial general intelligence.
Read more about Jay Dawani