Reader small image

You're reading from  Deep Learning with Keras

Product typeBook
Published inApr 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787128422
Edition1st Edition
Languages
Right arrow
Authors (2):
Antonio Gulli
Antonio Gulli
author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

Sujit Pal
Sujit Pal
author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal

View More author details
Right arrow

Chapter 3. Deep Learning with ConvNets

In previous chapters, we discussed dense nets, in which each layer is fully connected to the adjacent layers. We applied those dense networks to classify the MNIST handwritten characters dataset. In that context, each pixel in the input image is assigned to a neuron for a total of 784 (28 x 28 pixels) input neurons. However, this strategy does not leverage the spatial structure and relations of each image. In particular, this piece of code transforms the bitmap representing each written digit into a flat vector, where the spatial locality is gone:

#X_train is 60000 rows of 28x28 values --> reshaped in 60000 x 784
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
o

Convolutional neural networks (also called ConvNet) leverage spatial information and are therefore very well suited for classifying images. These nets use an ad hoc architecture inspired by biological data taken from physiological experiments done on the visual cortex...

Deep convolutional neural network — DCNN


A deep convolutional neural network (DCNN) consists of many neural network layers. Two different types of layers, convolutional and pooling, are typically alternated. The depth of each filter increases from left to right in the network. The last stage is typically made of one or more fully connected layers:

There are three key intuitions beyond ConvNets:

  • Local receptive fields
  • Shared weights
  • Pooling

Let's review them.

Local receptive fields

If we want to preserve spatial information, then it is convenient to represent each image with a matrix of pixels. Then, a simple way to encode the local structure is to connect a submatrix of adjacent input neurons into one single hidden neuron belonging to the next layer. That single hidden neuron represents one local receptive field. Note that this operation is named convolution and it gives the name to this type of network.

Of course, we can encode more information by having overlapping submatrices. For instance, let...

An example of DCNN — LeNet


Yann le Cun proposed (for more information refer to: Convolutional Networks for Images, Speech, and Time-Series, by Y. LeCun and Y. Bengio, brain theory neural networks, vol. 3361, 1995) a family of ConvNets named LeNet trained for recognizing MNIST handwritten characters with robustness to simple geometric transformations and to distortion. The key intuition here is to have low-layers alternating convolution operations with max-pooling operations. The convolution operations are based on carefully chosen local receptive fields with shared weights for multiple feature maps. Then, higher levels are fully connected layers based on a traditional MLP with hidden layers and softmax as the output layer.

LeNet code in Keras

To define LeNet code, we use a convolutional 2D module, which is:

keras.layers.convolutional.Conv2D(filters, kernel_size, padding='valid')

Here, filters is the number of convolution kernels to use (for example, the dimensionality of the output), kernel_size...

Recognizing CIFAR-10 images with deep learning


The CIFAR-10 dataset contains 60,000 color images of 32 x 32 pixels in 3 channels divided into 10 classes. Each class contains 6,000 images. The training set contains 50,000 images, while the test sets provides 10,000 images. This image taken from the CIFAR repository (https://www.cs.toronto.edu/~kriz/cifar.html) describes a few random examples from the 10 classes:

The goal is to recognize previously unseen images and assign them to one of the 10 classes. Let us define a suitable deep net.

First of all we import a number of useful modules, define a few constants, and load the dataset:

from keras.datasets import cifar10
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop
import matplotlib.pyplot as plt

# CIFAR_10 is a set of 60K images 32x32 pixels on 3 channels...

Very deep convolutional networks for large-scale image recognition


In 2014, an interesting contribution for image recognition was presented (for more information refer to: Very Deep Convolutional Networks for Large-Scale Image Recognition, by K. Simonyan and A. Zisserman, 2014). The paper shows that, a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. One model in the paper denoted as D or VGG-16 has 16 deep layers. An implementation in Java Caffe (http://caffe.berkeleyvision.org/) has been used for training the model on the ImageNet ILSVRC-2012 (http://image-net.org/challenges/LSVRC/2012/) dataset, which includes images of 1,000 classes and is split into three sets: training (1.3 million images), validation (50,000 images), and testing (100,000 images). Each image is (224 x 224) on three channels. The model achieves 7.5% top 5 error on ILSVRC-2012-val and 7.4% top 5 error on ILSVRC-2012-test.

According to the ImageNet site...

Summary


In this chapter, we learned how to use Deep Learning ConvNets for recognizing MNIST handwritten characters with high accuracy. Then we used the CIFAR 10 dataset to build a deep learning classifier in 10 categories, and the ImageNet datasets to build an accurate classifier in 1,000 categories. In addition, we investigated how to use large deep learning networks such as VGG16 and very deep networks such as InceptionV3. The chapter concluded with a discussion on transfer learning in order to adapt pre-built models trained on large datasets so that they can work well on a new domain.

In the next chapter, we will introduce generative adversarial networks used to reproduce synthetic data that looks like data generated by humans; and we will present WaveNet, a deep neural network used for reproducing human voice and musical instruments with high quality.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with Keras
Published in: Apr 2017Publisher: PacktISBN-13: 9781787128422
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal