Packt+ | Advance your knowledge in tech

You're reading from Applied Deep Learning with Keras

Product type Book

Published in Apr 2019

Publisher

ISBN-13 9781838555078

Pages 412 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Authors (3):

Ritesh Bhagwat

Mahla Abdolahnejad

Matthew Moocarme

View More author details

Table of Contents (12) Chapters

Applied Deep Learning with Keras

Preface

Introduction to Machine Learning with Keras

Machine Learning versus Deep Learning

Deep Learning with Keras

Evaluate Your Model with Cross-Validation using Keras Wrappers

Improving Model Accuracy

Model Evaluation

Computer Vision with Convolutional Neural Networks

Transfer Learning and Pre-Trained Models

Sequential Modeling with Recurrent Neural Networks

Chapter 7. Computer Vision with Convolutional Neural Networks

Note

Learning Objectives

By the end of this chapter, you will be able to:

Explain computer vision
Explain the architecture of a convolutional neural network
Perform max pooling, flattening, feature mapping, and feature detection
Explain image augmentation
Build image processing applications and classify images

Note

In this chapter, we will learn about the architecture of neural networks and perform techniques such as max pooling, flattening, feature mapping, and feature detection. We will also learn about image augmentation and how to build image processing applications and classify images.

Introduction

Computer vision is one of the most important concepts in machine learning and artificial intelligence. With the wide use of smart phones for capturing, sharing, and uploading images every day, the amount of data generated through images is increasing exponentially. So, the need for experts specializing in the field of computer vision is at an all-time high. Industries such as the health care industry are on the verge of a revolution due to the progress made in the field of medical imaging. This chapter introduces you to computer vision and the various industries in which computer vision is used. You will also learn about Convolutional Neural Networks (CNNs), which are the most widely used neural networks for image processing. Like neural networks, CNNs are also made up of neurons. The neurons receive inputs that are processed using weighted sums and activation functions. However, unlike ANNs, which use vectors as inputs, a CNN uses images as its input. In this chapter, we will...

Computer Vision

To understand computer vision, let's first understand what human vision is. Human vision is the ability of the human eye and brain to see and recognize objects. Computer vision is the process of giving a machine a similar, if not better, understanding of seeing and identifying objects in the real world. It is fairly simple for a human eye to precisely identify whether an animal is a tiger or a lion. But it takes a lot of training for a computer system to understand such objects distinctly. Computer vision can also be defined as building mathematical models that can mimic the function of a human eye and brain. Basically, it is about training computers to understand and process images and videos.

Computer vision is an integral part of many cutting-edge areas of robotics: health care and medical (X-ray, MRI scans, CT scans, and so on), drones, self-driving cars, sports and recreation, and so on. Almost all business need computer vision to run successfully. Imagine the large amount...

Convolutional Neural Networks

When you talk about computer vision, you talk about CNNs in the same breath. A CNN is a class of deep neural network that is mostly used in the field of computer vision and imaging. CNNs are used to identify images, cluster them by their similarity, and implement object recognition within scenes. A CNN has different layers, namely the input layer, the output layer, and multiple hidden layers. These hidden layers of a CNN consist of fully connected layers, convolutional layers, a RELU layer as an activation function, normalization layers, and pooling layers. On a very simple level, CNNs help to identify images and label them appropriately; for example, a tiger image will be identified as a tiger:

Figure 7.1: A Generalized CNN

An example of a CNN classifying a tiger:

Figure 7.2: CNN classifying a tiger

Architecture of a CNN

The main components of a CNN architecture are as follows:

Input image
Convolutional layer
Pooling layer
Flattening

Input Image

An input image forms the first component of a CNN architecture. An image can be of any type: a human, an animal, scenery, a medical X-ray image, and so on. Each image is converted into a mathematical matrix of zeros and ones. At a very high level, the following figure explains how a computer views an image of the letter T. All the blocks that have a value of one represent the data, while the zeros represents blank space:

Figure 7.3: Matrix for the letter 'T'

Convolution Layer

The convolution layer is the place where the image processing starts. A convolution layer consists of two steps:

Feature detector or filter
Feature map

Feature detector or filter: This is a matrix or pattern that you put on an image to transform it into a feature map:

Figure 7.4: Feature detector

Now, as highlighted, this feature detector is put (superimposed) on the original image and...

Image Augmentation

The word augmentation means the action or process of making or becoming greater in size or amount. Image or data augmentation works in a similar manner. Image/data augmentation creates many batches of our images. Then, it applies random transformations on random images inside the batches. Data transformation can be rotating images, shifting them, flipping them, and so on. By applying this transformation, we get more diverse images inside the batches, and we also have much more data than we had originally.

A cylinder can be rotated from different angles and seen differently. In the following figure, a single cylinder is seen from five different angles. So, we have effectively created five different images from a single image:

Figure 7.13: Image augmentation of a cylinder

The following is example code of image augmentation; here, the ImageDataGenerator class is used for processing. shear_range, zoom_range, and horizontal_flip are all used for the transformation of images:

from...

Summary

In this chapter, we studied why we need computer vision and how it works. We understood why computer vision is one of the hottest fields in machine learning. Then, we worked with convolutional neural networks, their architecture, and how we can build CNNs in real-life applications. We also tried to improve our algorithms by adding more ANN and CNN layers and by changing activation and optimizer functions. We also tried different activation functions and loss functions. In the end, we were able to successfully classify new images of cats and dogs through the algorithm. Remember, the images of dogs and cats can be substituted with any other images, such as tigers and deer, or MRI scans of brains with and without a tumor. Any binary-classification computer-imaging problem can be solved with the same approach.

In the next chapter, we will study an even more efficient technique for working on computer vision, which is less time-consuming and easier to implement.