Reader small image

You're reading from  Python Deep Learning

Product typeBook
Published inApr 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786464453
Edition1st Edition
Languages
Right arrow
Authors (4):
Valentino Zocca
Valentino Zocca
author image
Valentino Zocca

Valentino Zocca has a PhD degree and graduated with a Laurea in mathematics from the University of Maryland, USA, and University of Rome, respectively, and spent a semester at the University of Warwick. He started working on high-tech projects of an advanced stereo 3D Earth visualization software with head tracking at Autometric, a company later bought by Boeing. There he developed many mathematical algorithms and predictive models, and using Hadoop he automated several satellite-imagery visualization programs. He has worked as an independent consultant at the U.S. Census Bureau, in the USA and in Italy. Currently, Valentino lives in New York and works as an independent consultant to a large financial company.
Read more about Valentino Zocca

Gianmario Spacagna
Gianmario Spacagna
author image
Gianmario Spacagna

Gianmario Spacagna is a senior data scientist at Pirelli, processing sensors and telemetry data for internet of things (IoT) and connected-vehicle applications. He works closely with tire mechanics, engineers, and business units to analyze and formulate hybrid, physics-driven, and data-driven automotive models. His main expertise is in building ML systems and end-to-end solutions for data products. He holds a master's degree in telematics from the Polytechnic of Turin, as well as one in software engineering of distributed systems from KTH, Stockholm. Prior to Pirelli, he worked in retail and business banking (Barclays), cyber security (Cisco), predictive marketing (AgilOne), and did some occasional freelancing.
Read more about Gianmario Spacagna

Daniel Slater
Daniel Slater
author image
Daniel Slater

Daniel Slater started programming at age 11, developing mods for the id Software game Quake. His obsession led him to become a developer working in the gaming industry on the hit computer game series Championship Manager. He then moved into finance, working on risk- and high-performance messaging systems. He now is a staff engineer working on big data at Skimlinks to understand online user behavior. He spends his spare time training AI to beat computer games. He talks at tech conferences about deep learning and reinforcement learning; and the name of his blog is Daniel Slater's blog. His work in this field has been cited by Google.
Read more about Daniel Slater

Peter Roelants
Peter Roelants
author image
Peter Roelants

Peter Roelants holds a master's in computer science with a specialization in AI from KU Leuven. He works on applying deep learning to a variety of problems, such as spectral imaging, speech recognition, text understanding, and document information extraction. He currently works at Onfido as a team leader for the data extraction research team, focusing on data extraction from official documents.
Read more about Peter Roelants

View More author details
Right arrow

Chapter 5. Image Recognition

Vision is arguably the most important human sense. We rely on our vision to recognize our food, to run away from danger, to recognize our friends and family, and to find our way in familiar surroundings. We rely on our vision, in fact, to read this book and to recognize each and every letter and symbol printed in it. However, image recognition has (and in many ways still is) for the longest time been one of the most difficult problems in computer science. It is very hard to teach a computer programmatically how to recognize different objects, because it is difficult to explain to a machine what features make up a specified object. In deep learning, however, as we have seen, the neural network learns by itself, that is, it learns what features make up each object, and it is therefore well suited for a task such as image recognition.

In this chapter we will cover the following topics:

  • Similarities between artificial and biological models

  • Intuition and justification...

Similarities between artificial and biological models


Human vision is a complex and heavily structured process. The visual system works by hierarchically understanding reality through the retina, the thalamus, the visual cortex, and the inferior temporal cortex. The input to the retina is a two-dimensional array of color intensities that is sent, through the optical nerve, to the thalamus. The thalamus receives sensory information from all of our senses with the exception of the olfactory system and then it forwards the visual information collected from the retina to the primary visual cortex, which is the striate cortex (called V1), which extracts basic information such as lines and movement directions. The information then moves to the V2 region that is responsible for color interpretation and color constancy under different lighting conditions, then to the V3 and V4 regions that improve color and form perception. Finally, the information goes down to the Inferior Temporal cortex (IT)...

Intuition and justification


We have already mentioned in Chapter 3, Deep Learning Fundamentals, the paper published in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton titled: ImageNet Classification with Deep Convolutional Neural Networks. Though the genesis of convolutional may be traced back to the '80s, that was one of the first papers that highlighted the deep importance of convolutional networks in image processing and recognition, and currently almost no deep neural network used for image recognition can work without some convolutional layer.

An important problem that we have seen when working with classical feed-forward networks is that they may overfit, especially when working with medium to large images. This is often due to the fact that neural networks have a very large number of parameters, in fact in classical neural nets all neurons in a layer are connected to each and every neuron in the next. When the number of parameters is large, over-fitting is more likely...

Convolutional layers


A convolutional layer (sometimes referred to in the literature as "filter") is a particular type of neural network that manipulates the image to highlight certain features. Before we get into the details, let's introduce a convolutional filter using some code and some examples. This will make the intuition simpler and will make understanding the theory easier. To do this we can use the keras datasets, which makes it easy to load the data.

We will import numpy, then the mnist dataset, and matplotlib to show the data:

import numpy 
from keras.datasets import mnist  
import matplotlib.pyplot as plt 
import matplotlib.cm as cm

Let's define our main function that takes in an integer, corresponding to the image in the mnist dataset, and a filter, in this case we will define the blur filter:

def main(image, im_filter):
      im = X_train[image]

Now we define a new image imC, of size (im.width-2, im.height-2):

      width = im.shape[0]       
      height = im.shape[1]
      imC ...

Pooling layers


In the previous section, we have derived the formula for the size for each slice in a convolutional layer. As we discussed, one of the advantages of convolutional layers is that they reduce the number of parameters needed, improving performance and reducing over-fitting. After a convolutional operation, another operation is often performed—pooling. The most classical example is called max-pooling, and this means creating (2 x 2) grids on each slice, and picking the neuron with the maximum activation value in each grid, discarding the rest. It is immediate that such an operation discards 75% of the neurons, keeping only the neurons that contribute the most in each cell.

There are two parameters for each pooling layer, similar to the stride and padding parameters found in convolutional layers, and they are the size of the cell and the stride. One typical choice is to choose a cell size of 2 and a stride of 2, though it is not uncommon to pick a cell size of 3 and a stride of...

Dropout


Another important technique that can be applied after a pooling layer, but can also generally be applied to a fully connected layer, is to "drop" some neurons and their corresponding input and output connections randomly and periodically. In a dropout layer we specify a probability p for neurons to "drop out" stochastically. During each training period, each neuron has probability p to be dropped out from the network, and a probability (1-p) to be kept. This is to ensure that no neuron ends up relying too much on other neurons, and each neuron "learns" something useful for the network. This has two advantages: it speeds up the training, since we train a smaller network each time, and also helps in preventing over-fitting (see N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, in Journal of Machine Learning Research 15 (2014), 1929-1958, http://www.jmlr.org/papers/volume15/srivastava14a.old...

Convolutional layers in deep learning


When we introduced the idea of deep learning, we discussed how the word "deep" refers not only to the fact that we use many layers in our neural net, but also to the fact that we have a "deeper" learning process. Part of this deeper learning process was the ability of the neural net to learn features autonomously. In the previous section, we defined specific filters to help the network learn specific characteristics. This is not necessarily what we want. As we discussed, the point of deep learning is that the system learns on its own, and if we had to teach the network what features or characteristics are important, or how to learn to recognize digits by applying layers such as the edges layer that highlights the general shape of a digit, we would be doing most of the work and possibly constraining the network to learn features that may be relevant to us but not to the network, degrading its performance. The point of Deep Learning is that the system...

Convolutional layers in Theano


Now that we have the intuition of how convolutional layers work, we are going to implement a simple example of a convolutional layer using Theano.

Let us start by importing the modules that are needed:

import numpy  
import theano  
import matplotlib.pyplot as plt 
import theano.tensor as T
from theano.tensor.nnet import conv
import skimage.data
import matplotlib.cm as cm

Theano works by first creating a symbolic representation of the operations we define. We will later have another example using Keras, that, while it provides a nice interface to make creating neural networks easier, it lacks some of the flexibility one can have by using Theano (or TensorFlow) directly.

We define the variables needed and the neural network operations, by defining the number of feature maps (the depth of the convolutional layer) and the size of the filter, then we symbolically define the input using the Theano tensor class. Theano treats the image channels as a separate dimension...

A convolutional layer example with Keras to recognize digits


In the third chapter, we introduced a simple neural network to classify digits using Keras and we got 94%. In this chapter, we will work to improve that value above 99% using convolutional networks. Actual values may vary slightly due to variability in initialization.

First of all, we can start by improving the neural network we had defined by using 400 hidden neurons and run it for 30 epochs; that should get us already up to around 96.5% accuracy:

    hidden_neurons = 400
    epochs = 30

Next we could try scaling the input. Images are comprised of pixels, and each pixel has an integer value between 0 and 255. We could make that value a float and scale it between 0 and 1 by adding these four lines of code right after we define our input:

X_train = X_train.astype('float32')     
X_test = X_test.astype('float32')     
X_train /= 255     
X_test /= 255

If we run our network now, we get a poorer accuracy, just above 92%, but we need not...

A convolutional layer example with Keras for cifar10


We can now try to use the same network on the cifar10 dataset. In Chapter 3, Deep Learning Fundamentals, we were getting a low 50% accuracy on test data, and to test the new network we have just used for the mnist dataset, we need to just make a couple of small changes to our code: we need to load the cifar10 dataset (without doing any re-shaping, those lines will be deleted):

(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

And then change the input values for the first convolutional layer:

model.add(Convolution2D(32, (3, 3), input_shape=(32, 32, 3)))

Running this network for 5 epochs will give us around 60% accuracy (up from about 50%) and 66% accuracy after 10 epochs, but then the network starts to overfit and stops improving performance.

Of course the cifar10 images have 32 x 32 x 3 = 3072 pixels, instead of 28 x 28=784 pixels, so we may need to add a couple more convolutional layers, after the first two:

model.add(Convolution2D...

Pre-training


As we have seen, neural networks, and convolutional networks in particular, work by tuning the weights of the network as if they were coefficients of a large equation in order to get the correct output given a specific input. The tuning happens through back-propagation to move the weights towards the best solution given the chosen neural net architecture. One of the problems is therefore finding the best initialization values for the weights in the neural network. Libraries such as Keras can automatically take care of that. However, this topic is important enough to be worth discussing this point.

Restricted Boltzmann machines have been used to pre-train the network by using the input as the desired output to make the network automatically learn representations of the input and tune its weights accordingly, and this topic has already been discussed in Chapter 4, Unsupervised Feature Learning.

In addition, there exists many pre-trained networks that offer good results. As we have...

Summary


It should be noted, as it may have become clear, that there is no general architecture for a convolutional neural network. However, there are general guidelines. Normally, pooling layers follow convolutional layers, and often it is customary to stack two or more successive convolutional layers to detect more complex features, as it is done in the VGG-16 neural net example shown earlier. Convolutional networks are very powerful. However, they can be quite resource-heavy (the VGG-16 example above, for example, is relatively complex), and usually require a long training time, which is why the use of GPU can help speed up performance. Their strength comes from the fact that they do not focus on the entire image, rather they focus on smaller sub-regions to find interesting features that make up the image in order to be able to find discriminating elements between different inputs. Since convolutional layers are very resource-heavy, we have introduced pooling layers that help reduce the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Deep Learning
Published in: Apr 2017Publisher: PacktISBN-13: 9781786464453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Valentino Zocca

Valentino Zocca has a PhD degree and graduated with a Laurea in mathematics from the University of Maryland, USA, and University of Rome, respectively, and spent a semester at the University of Warwick. He started working on high-tech projects of an advanced stereo 3D Earth visualization software with head tracking at Autometric, a company later bought by Boeing. There he developed many mathematical algorithms and predictive models, and using Hadoop he automated several satellite-imagery visualization programs. He has worked as an independent consultant at the U.S. Census Bureau, in the USA and in Italy. Currently, Valentino lives in New York and works as an independent consultant to a large financial company.
Read more about Valentino Zocca

author image
Gianmario Spacagna

Gianmario Spacagna is a senior data scientist at Pirelli, processing sensors and telemetry data for internet of things (IoT) and connected-vehicle applications. He works closely with tire mechanics, engineers, and business units to analyze and formulate hybrid, physics-driven, and data-driven automotive models. His main expertise is in building ML systems and end-to-end solutions for data products. He holds a master's degree in telematics from the Polytechnic of Turin, as well as one in software engineering of distributed systems from KTH, Stockholm. Prior to Pirelli, he worked in retail and business banking (Barclays), cyber security (Cisco), predictive marketing (AgilOne), and did some occasional freelancing.
Read more about Gianmario Spacagna

author image
Daniel Slater

Daniel Slater started programming at age 11, developing mods for the id Software game Quake. His obsession led him to become a developer working in the gaming industry on the hit computer game series Championship Manager. He then moved into finance, working on risk- and high-performance messaging systems. He now is a staff engineer working on big data at Skimlinks to understand online user behavior. He spends his spare time training AI to beat computer games. He talks at tech conferences about deep learning and reinforcement learning; and the name of his blog is Daniel Slater's blog. His work in this field has been cited by Google.
Read more about Daniel Slater

author image
Peter Roelants

Peter Roelants holds a master's in computer science with a specialization in AI from KU Leuven. He works on applying deep learning to a variety of problems, such as spectral imaging, speech recognition, text understanding, and document information extraction. He currently works at Onfido as a team leader for the data extraction research team, focusing on data extraction from official documents.
Read more about Peter Roelants