Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Applied Deep Learning with Keras

You're reading from  Applied Deep Learning with Keras

Product type Book
Published in Apr 2019
Publisher
ISBN-13 9781838555078
Pages 412 pages
Edition 1st Edition
Languages
Authors (3):
Ritesh Bhagwat Ritesh Bhagwat
Profile icon Ritesh Bhagwat
Mahla Abdolahnejad Mahla Abdolahnejad
Profile icon Mahla Abdolahnejad
Matthew Moocarme Matthew Moocarme
Profile icon Matthew Moocarme
View More author details

Chapter 7. Computer Vision with Convolutional Neural Networks

Note

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain computer vision

  • Explain the architecture of a convolutional neural network

  • Perform max pooling, flattening, feature mapping, and feature detection

  • Explain image augmentation

  • Build image processing applications and classify images

Note

In this chapter, we will learn about the architecture of neural networks and perform techniques such as max pooling, flattening, feature mapping, and feature detection. We will also learn about image augmentation and how to build image processing applications and classify images.

Introduction


Computer vision is one of the most important concepts in machine learning and artificial intelligence. With the wide use of smart phones for capturing, sharing, and uploading images every day, the amount of data generated through images is increasing exponentially. So, the need for experts specializing in the field of computer vision is at an all-time high. Industries such as the health care industry are on the verge of a revolution due to the progress made in the field of medical imaging. This chapter introduces you to computer vision and the various industries in which computer vision is used. You will also learn about Convolutional Neural Networks (CNNs), which are the most widely used neural networks for image processing. Like neural networks, CNNs are also made up of neurons. The neurons receive inputs that are processed using weighted sums and activation functions. However, unlike ANNs, which use vectors as inputs, a CNN uses images as its input. In this chapter, we will...

Computer Vision


To understand computer vision, let's first understand what human vision is. Human vision is the ability of the human eye and brain to see and recognize objects. Computer vision is the process of giving a machine a similar, if not better, understanding of seeing and identifying objects in the real world. It is fairly simple for a human eye to precisely identify whether an animal is a tiger or a lion. But it takes a lot of training for a computer system to understand such objects distinctly. Computer vision can also be defined as building mathematical models that can mimic the function of a human eye and brain. Basically, it is about training computers to understand and process images and videos.

Computer vision is an integral part of many cutting-edge areas of robotics: health care and medical (X-ray, MRI scans, CT scans, and so on), drones, self-driving cars, sports and recreation, and so on. Almost all business need computer vision to run successfully. Imagine the large amount...

Convolutional Neural Networks


When you talk about computer vision, you talk about CNNs in the same breath. A CNN is a class of deep neural network that is mostly used in the field of computer vision and imaging. CNNs are used to identify images, cluster them by their similarity, and implement object recognition within scenes. A CNN has different layers, namely the input layer, the output layer, and multiple hidden layers. These hidden layers of a CNN consist of fully connected layers, convolutional layers, a RELU layer as an activation function, normalization layers, and pooling layers. On a very simple level, CNNs help to identify images and label them appropriately; for example, a tiger image will be identified as a tiger:

Figure 7.1: A Generalized CNN

An example of a CNN classifying a tiger:

Figure 7.2: CNN classifying a tiger

Architecture of a CNN


The main components of a CNN architecture are as follows:

  1. Input image

  2. Convolutional layer

  3. Pooling layer

  4. Flattening

Input Image

An input image forms the first component of a CNN architecture. An image can be of any type: a human, an animal, scenery, a medical X-ray image, and so on. Each image is converted into a mathematical matrix of zeros and ones. At a very high level, the following figure explains how a computer views an image of the letter T. All the blocks that have a value of one represent the data, while the zeros represents blank space:

Figure 7.3: Matrix for the letter 'T'

Convolution Layer

The convolution layer is the place where the image processing starts. A convolution layer consists of two steps:

  1. Feature detector or filter

  2. Feature map

Feature detector or filter: This is a matrix or pattern that you put on an image to transform it into a feature map:

Figure 7.4: Feature detector

Now, as highlighted, this feature detector is put (superimposed) on the original image and...

Image Augmentation


The word augmentation means the action or process of making or becoming greater in size or amount. Image or data augmentation works in a similar manner. Image/data augmentation creates many batches of our images. Then, it applies random transformations on random images inside the batches. Data transformation can be rotating images, shifting them, flipping them, and so on. By applying this transformation, we get more diverse images inside the batches, and we also have much more data than we had originally.

A cylinder can be rotated from different angles and seen differently. In the following figure, a single cylinder is seen from five different angles. So, we have effectively created five different images from a single image:

Figure 7.13: Image augmentation of a cylinder

The following is example code of image augmentation; here, the ImageDataGenerator class is used for processing. shear_range, zoom_range, and horizontal_flip are all used for the transformation of images:

from...

Summary


In this chapter, we studied why we need computer vision and how it works. We understood why computer vision is one of the hottest fields in machine learning. Then, we worked with convolutional neural networks, their architecture, and how we can build CNNs in real-life applications. We also tried to improve our algorithms by adding more ANN and CNN layers and by changing activation and optimizer functions. We also tried different activation functions and loss functions. In the end, we were able to successfully classify new images of cats and dogs through the algorithm. Remember, the images of dogs and cats can be substituted with any other images, such as tigers and deer, or MRI scans of brains with and without a tumor. Any binary-classification computer-imaging problem can be solved with the same approach.

In the next chapter, we will study an even more efficient technique for working on computer vision, which is less time-consuming and easier to implement.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Applied Deep Learning with Keras
Published in: Apr 2019 Publisher: ISBN-13: 9781838555078
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}