Packt+ | Advance your knowledge in tech

You're reading from Python Deep Learning

Product typeBook

Published inApr 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781786464453

Edition1st Edition

Languages

Python

Tools

TensorFlow Theano

Concepts

Deep Learning

Authors (4):

Valentino Zocca

Gianmario Spacagna

Daniel Slater

Peter Roelants

View More author details

Chapter 5. Image Recognition

Vision is arguably the most important human sense. We rely on our vision to recognize our food, to run away from danger, to recognize our friends and family, and to find our way in familiar surroundings. We rely on our vision, in fact, to read this book and to recognize each and every letter and symbol printed in it. However, image recognition has (and in many ways still is) for the longest time been one of the most difficult problems in computer science. It is very hard to teach a computer programmatically how to recognize different objects, because it is difficult to explain to a machine what features make up a specified object. In deep learning, however, as we have seen, the neural network learns by itself, that is, it learns what features make up each object, and it is therefore well suited for a task such as image recognition.

In this chapter we will cover the following topics:

Similarities between artificial and biological models
Intuition and justification...

Similarities between artificial and biological models

Human vision is a complex and heavily structured process. The visual system works by hierarchically understanding reality through the retina, the thalamus, the visual cortex, and the inferior temporal cortex. The input to the retina is a two-dimensional array of color intensities that is sent, through the optical nerve, to the thalamus. The thalamus receives sensory information from all of our senses with the exception of the olfactory system and then it forwards the visual information collected from the retina to the primary visual cortex, which is the striate cortex (called V1), which extracts basic information such as lines and movement directions. The information then moves to the V2 region that is responsible for color interpretation and color constancy under different lighting conditions, then to the V3 and V4 regions that improve color and form perception. Finally, the information goes down to the Inferior Temporal cortex (IT)...

Intuition and justification

We have already mentioned in Chapter 3, Deep Learning Fundamentals, the paper published in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton titled: ImageNet Classification with Deep Convolutional Neural Networks. Though the genesis of convolutional may be traced back to the '80s, that was one of the first papers that highlighted the deep importance of convolutional networks in image processing and recognition, and currently almost no deep neural network used for image recognition can work without some convolutional layer.

An important problem that we have seen when working with classical feed-forward networks is that they may overfit, especially when working with medium to large images. This is often due to the fact that neural networks have a very large number of parameters, in fact in classical neural nets all neurons in a layer are connected to each and every neuron in the next. When the number of parameters is large, over-fitting is more likely...

Convolutional layers

A convolutional layer (sometimes referred to in the literature as "filter") is a particular type of neural network that manipulates the image to highlight certain features. Before we get into the details, let's introduce a convolutional filter using some code and some examples. This will make the intuition simpler and will make understanding the theory easier. To do this we can use the keras datasets, which makes it easy to load the data.

We will import numpy, then the mnist dataset, and matplotlib to show the data:

import numpy 
from keras.datasets import mnist  
import matplotlib.pyplot as plt 
import matplotlib.cm as cm

Let's define our main function that takes in an integer, corresponding to the image in the mnist dataset, and a filter, in this case we will define the blur filter:

def main(image, im_filter):
      im = X_train[image]

Now we define a new image imC, of size (im.width-2, im.height-2):

      width = im.shape[0]       
      height = im.shape[1]
      imC ...

Pooling layers

In the previous section, we have derived the formula for the size for each slice in a convolutional layer. As we discussed, one of the advantages of convolutional layers is that they reduce the number of parameters needed, improving performance and reducing over-fitting. After a convolutional operation, another operation is often performed—pooling. The most classical example is called max-pooling, and this means creating (2 x 2) grids on each slice, and picking the neuron with the maximum activation value in each grid, discarding the rest. It is immediate that such an operation discards 75% of the neurons, keeping only the neurons that contribute the most in each cell.

There are two parameters for each pooling layer, similar to the stride and padding parameters found in convolutional layers, and they are the size of the cell and the stride. One typical choice is to choose a cell size of 2 and a stride of 2, though it is not uncommon to pick a cell size of 3 and a stride of...

Dropout

Another important technique that can be applied after a pooling layer, but can also generally be applied to a fully connected layer, is to "drop" some neurons and their corresponding input and output connections randomly and periodically. In a dropout layer we specify a probability p for neurons to "drop out" stochastically. During each training period, each neuron has probability p to be dropped out from the network, and a probability (1-p) to be kept. This is to ensure that no neuron ends up relying too much on other neurons, and each neuron "learns" something useful for the network. This has two advantages: it speeds up the training, since we train a smaller network each time, and also helps in preventing over-fitting (see N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, in Journal of Machine Learning Research 15 (2014), 1929-1958, http://www.jmlr.org/papers/volume15/srivastava14a.old...

Convolutional layers in deep learning

When we introduced the idea of deep learning, we discussed how the word "deep" refers not only to the fact that we use many layers in our neural net, but also to the fact that we have a "deeper" learning process. Part of this deeper learning process was the ability of the neural net to learn features autonomously. In the previous section, we defined specific filters to help the network learn specific characteristics. This is not necessarily what we want. As we discussed, the point of deep learning is that the system learns on its own, and if we had to teach the network what features or characteristics are important, or how to learn to recognize digits by applying layers such as the edges layer that highlights the general shape of a digit, we would be doing most of the work and possibly constraining the network to learn features that may be relevant to us but not to the network, degrading its performance. The point of Deep Learning is that the system...

Convolutional layers in Theano

Now that we have the intuition of how convolutional layers work, we are going to implement a simple example of a convolutional layer using Theano.

Let us start by importing the modules that are needed:

import numpy  
import theano  
import matplotlib.pyplot as plt 
import theano.tensor as T
from theano.tensor.nnet import conv
import skimage.data
import matplotlib.cm as cm

Theano works by first creating a symbolic representation of the operations we define. We will later have another example using Keras, that, while it provides a nice interface to make creating neural networks easier, it lacks some of the flexibility one can have by using Theano (or TensorFlow) directly.

We define the variables needed and the neural network operations, by defining the number of feature maps (the depth of the convolutional layer) and the size of the filter, then we symbolically define the input using the Theano tensor class. Theano treats the image channels as a separate dimension...

A convolutional layer example with Keras to recognize digits

In the third chapter, we introduced a simple neural network to classify digits using Keras and we got 94%. In this chapter, we will work to improve that value above 99% using convolutional networks. Actual values may vary slightly due to variability in initialization.

First of all, we can start by improving the neural network we had defined by using 400 hidden neurons and run it for 30 epochs; that should get us already up to around 96.5% accuracy:

    hidden_neurons = 400
    epochs = 30

Next we could try scaling the input. Images are comprised of pixels, and each pixel has an integer value between 0 and 255. We could make that value a float and scale it between 0 and 1 by adding these four lines of code right after we define our input:

X_train = X_train.astype('float32')     
X_test = X_test.astype('float32')     
X_train /= 255     
X_test /= 255

If we run our network now, we get a poorer accuracy, just above 92%, but we need not...

A convolutional layer example with Keras for cifar10

We can now try to use the same network on the cifar10 dataset. In Chapter 3, Deep Learning Fundamentals, we were getting a low 50% accuracy on test data, and to test the new network we have just used for the mnist dataset, we need to just make a couple of small changes to our code: we need to load the cifar10 dataset (without doing any re-shaping, those lines will be deleted):

(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

And then change the input values for the first convolutional layer:

model.add(Convolution2D(32, (3, 3), input_shape=(32, 32, 3)))

Running this network for 5 epochs will give us around 60% accuracy (up from about 50%) and 66% accuracy after 10 epochs, but then the network starts to overfit and stops improving performance.

Of course the cifar10 images have 32 x 32 x 3 = 3072 pixels, instead of 28 x 28=784 pixels, so we may need to add a couple more convolutional layers, after the first two:

model.add(Convolution2D...

Pre-training

As we have seen, neural networks, and convolutional networks in particular, work by tuning the weights of the network as if they were coefficients of a large equation in order to get the correct output given a specific input. The tuning happens through back-propagation to move the weights towards the best solution given the chosen neural net architecture. One of the problems is therefore finding the best initialization values for the weights in the neural network. Libraries such as Keras can automatically take care of that. However, this topic is important enough to be worth discussing this point.

Restricted Boltzmann machines have been used to pre-train the network by using the input as the desired output to make the network automatically learn representations of the input and tune its weights accordingly, and this topic has already been discussed in Chapter 4, Unsupervised Feature Learning.

In addition, there exists many pre-trained networks that offer good results. As we have...

Summary

It should be noted, as it may have become clear, that there is no general architecture for a convolutional neural network. However, there are general guidelines. Normally, pooling layers follow convolutional layers, and often it is customary to stack two or more successive convolutional layers to detect more complex features, as it is done in the VGG-16 neural net example shown earlier. Convolutional networks are very powerful. However, they can be quite resource-heavy (the VGG-16 example above, for example, is relatively complex), and usually require a long training time, which is why the use of GPU can help speed up performance. Their strength comes from the fact that they do not focus on the entire image, rather they focus on smaller sub-regions to find interesting features that make up the image in order to be able to find discriminating elements between different inputs. Since convolutional layers are very resource-heavy, we have introduced pooling layers that help reduce the...

The rest of the chapter is locked

You have been reading a chapter from

Python Deep Learning

Published in: Apr 2017Publisher: PacktISBN-13: 9781786464453

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (4)

Valentino Zocca

Valentino Zocca has a PhD degree and graduated with a Laurea in mathematics from the University of Maryland, USA, and University of Rome, respectively, and spent a semester at the University of Warwick. He started working on high-tech projects of an advanced stereo 3D Earth visualization software with head tracking at Autometric, a company later bought by Boeing. There he developed many mathematical algorithms and predictive models, and using Hadoop he automated several satellite-imagery visualization programs. He has worked as an independent consultant at the U.S. Census Bureau, in the USA and in Italy. Currently, Valentino lives in New York and works as an independent consultant to a large financial company.
Read more about Valentino Zocca

Gianmario Spacagna

Gianmario Spacagna is a senior data scientist at Pirelli, processing sensors and telemetry data for internet of things (IoT) and connected-vehicle applications. He works closely with tire mechanics, engineers, and business units to analyze and formulate hybrid, physics-driven, and data-driven automotive models. His main expertise is in building ML systems and end-to-end solutions for data products. He holds a master's degree in telematics from the Polytechnic of Turin, as well as one in software engineering of distributed systems from KTH, Stockholm. Prior to Pirelli, he worked in retail and business banking (Barclays), cyber security (Cisco), predictive marketing (AgilOne), and did some occasional freelancing.
Read more about Gianmario Spacagna

Daniel Slater

Daniel Slater started programming at age 11, developing mods for the id Software game Quake. His obsession led him to become a developer working in the gaming industry on the hit computer game series Championship Manager. He then moved into finance, working on risk- and high-performance messaging systems. He now is a staff engineer working on big data at Skimlinks to understand online user behavior. He spends his spare time training AI to beat computer games. He talks at tech conferences about deep learning and reinforcement learning; and the name of his blog is Daniel Slater's blog. His work in this field has been cited by Google.
Read more about Daniel Slater

Peter Roelants

Peter Roelants holds a master's in computer science with a specialization in AI from KU Leuven. He works on applying deep learning to a variety of problems, such as spectral imaging, speech recognition, text understanding, and document information extraction. He currently works at Onfido as a team leader for the data extraction research team, focusing on data extraction from official documents.
Read more about Peter Roelants

Other recommended products

Related to this chapter

Python Deep Learning

The book will help you learn deep neural networks and their applications in computer vision, generative models, and natural language processing. It will also introduce you to the area of reinforcement learning, where you’ll learn the state-of-the-art algorithms to teach the machines how to play games like Go and Atari.

BookJan 2019386 pages

Deep Learning with Hadoop

BookFeb 2017206 pages

Deep Learning with Theano

This book covers a complete overview of Deep Learning with Theano, a Python-based library that makes optimizing numerical expressions easy. Practical code examples address supervised, unsupervised, generative and reinforcement learning for image recognition, natural language processing, or game strategy, with best performing nets and principles.

BookJul 2017300 pages

Advanced Deep Learning with Python

This book is an expert-level guide to master the neural network variants using the Python ecosystem. You will gain the skills to build smarter, faster, and efficient deep learning systems with practical examples. By the end of this book, you will be up to date with the latest advances and current researches in the deep learning domain.

BookDec 2019468 pages

Reinforcement Learning with TensorFlow

Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models.You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining the power of TensorFlow and OpenAI Gym.

BookApr 2018334 pages

Practical Reinforcement Learning

Reinforcement learning (RL) is becoming a popular tool for constructing autonomous systems that improve themselves with experience. We will break the RL framework into its core building blocks, and provide you with details of each element. This book is divided into three parts. The first part defines Reinforcement Learning and describes the basics and the Python and Java frameworks, which we are going to use later in the book. The second part discusses learning techniques with basic algorithms such as Temporal Difference, Monte Carlo, and Policy Gradient—all with practical examples. Lastly, in the third part we apply Reinforcement Learning with the most recent and widely used algorithms, via practical applications.

BookOct 2017336 pages

Hands-On Q-Learning with Python

Q-learning is the reinforcement learning approach behind Deep-Q-Learning and is a values-based learning algorithm in RL. This book will help you get comfortable with developing the effective agents for Q learning and also make you learn to effectively develop and deploy Deep Q networks for complex AI applications.

BookApr 2019212 pages

Intelligent Projects Using Python

This book includes 9 projects on building smart and practical AI-based systems. These projects cover solutions to different domain-specific problems in healthcare, e-commerce and more. With this book, you will apply different machine learning and deep learning techniques and learn how to build your own intelligent applications for smart predictions and other insight-driven tasks.

BookJan 2019342 pages

R Deep Learning Cookbook

Deep Learning is the next big thing. It is a part of machine learning. Its favorable results in application with huge and complex data is remarkable. This book will help you to get through the problems that you face during the execution of different tasks and understand hacks in deep learning, neural networks, and advanced machine learning techniques

BookAug 2017288 pages

Hands-On Reinforcement Learning with Python

Reinforcement learning is a self-evolving type of machine learning that takes us closer to achieving true artificial intelligence. This easy-to-follow guide explains everything from scratch using rich examples written in Python.

BookJun 2018318 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages