Reader small image

You're reading from  Applied Deep Learning and Computer Vision for Self-Driving Cars

Product typeBook
Published inAug 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838646301
Edition1st Edition
Languages
Right arrow
Authors (2):
Sumit Ranjan
Sumit Ranjan
author image
Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

Dr. S. Senthamilarasu
Dr. S. Senthamilarasu
author image
Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu

View More author details
Right arrow
Improving the Image Classifier with CNN

If you've been following the latest news on self-driving cars (SDCs), you will have heard about convolutional neural networks (CNNs, or ConvNets). We use ConvNets to perform a multitude of perception tasks for SDCs. In this chapter, we will take a deeper look at this fascinating architecture and understand its importance. Specifically, you will learn how convolutional layers use cross-correlation, instead of general matrix multiplication, to tailor neural networks to the image input data. We'll also cover the advantages of these models over standard feed-forward neural networks. 

ConvNets have neurons with learnable weights and biases. Similar to neural networks, each neuron in a ConvNet receives input, and then performs a dot product and follows non-linearity as well.

The pixels of raw images of the network...

Images in computer format

We've already read about image formatting in computers in Chapter 4, Computer Vision for Self-Driving Cars. Basically, there are three channels red, green, and blue, which are popularly known as RGB. They have their respective pixel values. So, if we will say the size of an image is B x A x 3, this means there are B rows, A columns, and 3 channels. If the image size is 28 x 28 x 3, this means there are 28 rows, 28 columns, and 3 channels.

This is how our computer sees images. For black and white images, there are only two channels.

In the following screenshot, you can see a visual example of a computer viewing an image:

Fig 6.1: Computer viewing an image

In the next section, we will read about why we need CNNs.

The need for CNNs

We need CNNs because neural networks do not scale well to image data. In Chapter 4Computer Vision for Self-Driving Cars, we discussed how images are stored. When we build a simple image classifier, it takes color images with a size of 64 x 64 (height x width).

So, the input size for the neural network will be .

Therefore, our input layer will have 12,288 weights. If we use an image with a size of  , we will have 49,152 weights. If we add hidden layers, we will see that it will exponentially increase in training time. The CNN doesn't actually reduce the weights in the input layer. It finds a representation internally in hidden layers to basically take advantage of how images are formed. This way, we can actually make our neural network much more effective at dealing with image data.

In the next section, we will read about the intuition behind these neural networks.

The intuition behind CNNs

A CNN is a type of feed-forward artificial neural network where the connection between its neurons is inspired by an animal's visual cortex.

The visual cortex is the part of the brain's cerebral cortex that processes visual information:

Fig 6.2: Visual cortex 

The visual cortex is a small region of cells that is sensitive to a specific region of the visual field. For example, some neurons in the visual cortex fire when exposed to vertical edges, some fire when exposed to horizontal edges, and some will fire for diagonal edges. That is the process behind CNNs. 

You can read more about the virtual cortex at https://en.wikipedia.org/wiki/Visual_cortex.

We have already studied, in Chapter 2, Dive Deep into Deep Neural Networks, how biological neural networks can be converted into artificial networks.

In the next section, we will study CNNs in-depth.

Introducing CNNs

In the following screenshot, we can see all the layers of a CNN. We will go through each of them in detail:

Fig 6.3: CNN layers

We know that in this neural network, we have an input layer and hidden layers.

The layers of a CNN are as follows:

  • The input layer
  • The convolution layer
  • The ReLU layer
  • The pooling layer
  • The fully connected layer

In the following section, we will learn about 3D layers. 

Why 3D layers?

3D layers allow us to use convolutions to learn the image features. This helps the network decrease its training time as there will be less weight in the deep network.

The three dimensions of an image are as follows:

  • Height
  • Width
  • Depth (RGB)

The 3D layers of an image can be seen in the following screenshot:

Fig 6.4: 3D layers

In the next section of this chapter, we will understand the convolution layer.

Understanding the convolution layer

The convolution layer is the most important part of a CNN as it is the layer that learns the image features. Before we dive deep into convolutions, we will learn about image features. Image features are the part of the image that we are most interested in.

Some examples of image features are as follows:

  • Edges
  • Colors
  • Patterns/shapes

Before CNNs, the extraction of features from an image was a tedious process—the feature engineering done for one set of images would not be appropriate for another set of images.

Now, we will see what exactly a convolution is. In simple terms, convolution is a mathematical term to describe the process of combining two functions to produce a third function. The third function, or the output, is called the feature map. Convolution is the action of using a kernel or filter applied to an input image, and the output is the feature map. 

The convolution feature is executed by sliding a kernel over the input...

Depth, stride, and padding 

Depth, stride, and padding are the hyperparameters used to tweak the size of the convolutional filters. In the previous section, Understanding the convolution layer, we applied 3 x 3 filters or kernels for the convolution of a CNN. But the question is, does the filter have to be 3 x 3? How many filters do we need? Are we going to shift over pixel by pixel? 

We can have filters of greater size than 3 x 3. It is possible to do this by tweaking the following parameters. You can also tweak these parameters to control the size of the feature maps:

  • Kernel size (K x K)
  • Depth
  • Stride
  • Padding

Depth

Depth tells us the number of filters used. It does not relate to the image depth, nor to the number of hidden layers in the CNN. Each filter or kernel learns different feature maps that are activated in the presence of different image features, such as edges, patterns, and colors.

Stride 

Stride basically refers to the step size we take when we slide the kernel across the 
input image.

An example of a stride of 1 is shown in the following screenshot:

Fig 6.10: Stride of 1

In a stride of 1, the value of the feature map is 9. Similarly, a stride of 2 looks as follows: 

Fig 6.11: Stride of 2

When we have a stride of 2, the value of the feature map will be 4, which is equivalent to 2 x 2.

Stride is important because of the following points:

  • The stride controls the size of the convolution layer output.
  • Using larger strides produces less overlap in kernels.
  • Stride is one of the methods to control the spatial input size—that is, passing information to other input layers without losing it.

Zero-padding

Zero-padding is a very simple concept that we apply to the border of our input. With a stride of 1, the output of the feature map will be a 3 x 3 matrix. We can see that after applying a stride of 1, we end up with a tiny output. This output will be the input for the next layer. In this way, there are high chances of losing information. So, we add a border of zeros around the input, as shown in the following screenshot:

Fig 6.12: Zero-padding

Adding zeros around the border is equivalent to adding a black border around an image. We can also set the padding to 2 if required.

Now, we will calculate the output of the convolution mathematically. We have the following parameters:

  • Kernal/filter size, K
  • Depth, D
  • Stride, S
  • Zero-padding, P
  • Input image size, I

To ensure that the filters cover the full input image symmetrically, we'll use the following equation to do the sanity check; it is valid if the result of the equation is an integer:

In the next section, we will...

ReLU

ReLU is the activation layer of choice for CNNs. We studied activation layers in Chapter 2, Dive Deep into Deep Neural Networks. As we know, we need to introduce non-linearity to our model as the convolution process is linear. So, we apply an activation function to the output of the CNN.

The ReLU function simply changes all the negative values to 0, while positive values are unchanged, as shown in the following screenshot:

Fig 6.13: ReLU

An example of ReLU introducing non-linearity in the feature map's output can be seen in the following screenshot:

Fig 6.14: Applying ReLU

In the next section, we will learn about fully connected layers.

Fully connected layers

We have already learned about fully connected layers in Chapter 2Dive Deep into Deep Neural Networks. Having fully connected layers simply means that all the nodes in one layer are connected to the outputs of the next layers. The output of the fully connected layer is a class of probabilities, where each class is assigned a probability. All probabilities must sum up to 1. The activation function used at the output of the layer is called the softmax function.

The softmax function

The activation function used to produce the probabilities per class is called the softmax function. It turns the output of the fully connected layer, or the last layer, into probabilities. The sum of the probabilities of the classes is 1 – Panda = 0.04, Cat = 0.91, and Dog = 0.05, which totals 1

We can see the values of the softmax function in the following screenshot:

Fig 6.15: Output of softmax

In the next section of this chapter, we will implement a handwritten digit recognition CNN model using the Keras API.

Introduction to handwritten digit recognition

The MNIST dataset is one of the most popular datasets in the field of computer vision. It is a fairly large dataset, consisting of 60,000 training images and 10,000 test images.

We can see a sample of it in the following screenshot:

Fig 6.16: The MNIST dataset
You can find out more about the MNIST dataset at http://yann.lecun.com/exdb/mnist/ and https://en.wikipedia.org/wiki/MNIST_databas.

Now, we will learn about the problem statement and implement a CNN using Keras.

Problem and aim

The MNIST dataset was developed for the US postal service to automatically read written postcodes on mail. The aim of our classifier is simple: to take the digits in the format provided and correctly identify the given digits. Digit identification has multiple applications in self-driving cars; one of the important applications is traffic sign detection—for example, detecting the speed limit.

You can see an example in the following screenshot: 

Fig 6.17: Speed limit traffic signs 

We are going to build a traffic sign detector in the next chapter using a German traffic sign dataset.

In the next section, we will start by loading the data.

Loading the data

Loading the data is a simple but obviously integral first step to creating a deep learning model. Fortunately, Keras has some built-in data loaders that are simple to execute. Data is stored in an array:

  1. First, we will import the keras dataset from TensorFlow:
from keras.datasets import mnist
  1. Then, we will create the test and train datasets:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
  1. Now, we will print and check the shape of the x_train data:
print(x_train.shape)
  1. The shape of x_train is as follows:
(60000, 28, 28)

One of the confusing things that newcomers face when using Keras is getting their dataset in the correct shape (dimensionality) required for Keras.

  1. When we first load our dataset to Keras, it comes in the form of 60,000 images, 28 x 28 pixels. Let's inspect this in Python by printing the initial shape, the dimension, and the number of samples and labels in our training data:
print ("Initial shape &...

Reshaping the data

The following code helps us to reshape the Keras input:

img_rows = x_train[0].shape[0]
img_cols = x_train[1].shape[0]
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

In the next section, we will normalize the data between 0 and 1.

The transformation of data

We're going to look at transformations on the training and test image data. For x_train and x_test, we need to do the following:

  1. Add a fourth dimension going from (60000, 28, 28) to (60000, 28, 28, 1),
  2. Change it to the Float32 data type.
  3. Normalize it between 0 and 1 (by dividing by 255).

In the following code block, we will perform normalization on the data:

x_train /=255
x_test /=255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

The shape of the data doesn't change after normalization:

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples

In the next section, we will perform one-hot encoding on our target variables.

One-hot encoding the output

In this section, we're going to one-hot encode the output data. By using one-hot encoding we can convert a categorical variable, and the variable with a new format helps to do a better machine learning prediction. It is easier for the computer as well to interpret the inputs in the form of one-hot encoding.

An example of one-hot encoding can be seen in the following screenshot:

Fig 6.19: One-hot encoding

In the preceding screenshot, we have three products, and their categorical values are 1, 2, and 3. We can see how products are represented by one-hot encoding: for Product A, it is (1, 0, 0) and for Product B, it is (0, 1, 0). Similarly, if we want to do the same for our data, we will get (0, 0, 0, 0, 1, 0, 0, 0, 0) for 5

The following code will help us to one-hot encode the output:

from keras.utils import np_utils

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

print ("Number of classes: "...

Building and compiling our model

We have already read about building and compiling models in Chapter 3, Implementing a Deep Learning Model Using Keras. Let's build a simple neural network, and then we will start building the model. In this section, we will add the layers to be used in our deep learning model:

  1. We will first import the important libraries from Keras:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
  1. Let's design the CNN with the following code. We add our first layer as the convolution layer with a filter of 32, kernal_size set to (3,3), and ReLU as our activation function. Then, we add a convolution layer with a filter value of 64 with ReLU as our activation function. Then, we added a maxpooling layer. Finally, we drop out and flatten a dense layer with a filter size of 128 and our ReLU activation function. Finally, we add one more dropout layer:
model = tf.keras.Sequential()

model.add(tf.keras.layers.Conv2D(10, kernel_size...

Compiling the model 

For the completion of the model, we need to choose a loss in the optimizer and the metrics that we're concerned with during fitting:

model.compile(loss ="categorical_crossentropy", optimizer= 'SGD', metrics = ['accuracy'])

print(model.summary())

Let's look at the output after compiling the model:

Fig 6.20: Compiling the model

In the next section, we will train the model.

Training the model

We are taking a batch size of 32 and 6 epochs. We can play with these parameters to increase the accuracy.

Write the following code to train the model:

batch_size = 32
epochs = 6

history = model.fit(x_train,
y_train,
batch_size= batch_size,
epochs = epochs,
verbose=1,
validation_data= (x_test, y_test))



score = model.evaluate(x_test, y_test, verbose=0)
print('Test_loss:', score[0])
print('Test_accuracy:', score[1])

Here are the training results:

Fig 6.21: Model training

The training accuracy for the model is 95.84 and the test loss is 0.069.

Validation versus train loss

We will compare validation to the training loss by plotting a graph. We are going to use matplotlib for this:

import matplotlib.pyplot as plt

history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
epochs = range(1, len(loss_values)+ 1)
line1 = plt.plot(epochs, val_loss_values, label = 'Validation/Test Loss')
line2 = plt.plot(epochs, loss_values, label= 'Training Loss')
plt.setp(line1, linewidth=2.0, marker = '+', markersize=10.0)
plt.setp(line2, linewidth=2.0, marker= '4', markersize=10.0)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid(True)
plt.legend()
plt.show()

Let's look at the output:

Fig 6.22: Validation versus training loss plot

The training loss started from 0.6 and ended at 0.16, and the validation loss started from 2.2 and ended at 0.06. Our model performed well, as the loss has decreased to a minimum.

Validation versus test accuracy

In this section, we will plot the graph for validation and test accuracy.

In this step, we will compare validation and test accuracy:

import matplotlib.pyplot as plt

history_dict = history.history
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
epochs = range(1, len(loss_values)+ 1)
line1 = plt.plot(epochs, val_acc_values, label = 'Validation/Test Accuracy')
line2 = plt.plot(epochs, acc_values, label= 'Training Accuracy')
plt.setp(line1, linewidth=2.0, marker = '+', markersize=10.0)
plt.setp(line2, linewidth=2.0, marker= '4', markersize=10.0)
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.grid(True)
plt.legend()
plt.show()

Here's our output:

Fig 6.23: Validation versus train accuracy plot

We can see that our training accuracy is around 96% and the test accuracy is around 97.84%; this shows that our model performed well.

Saving the model

We need to save our model so that it can be reused later. 

Here is the code for saving your model:

model.save("./mnist.h5")

model.save will save the model and load_model is used to reload the model:

from keras.models import load_model
model = load_model('./mnist.h5')

In the next section, we will visualize the model architecture.

Visualizing the model architecture

Keras has a great functionality, and in this section, we will use it to visualize the model architecture.

The following code will help you create a visualization of your image:

from keras.utils.vis_utils import plot_model
%matplotlib inline

from keras.utils import plot_model
plot_model(model, to_file='model.png',
show_shapes= True,
show_layer_names = True)

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img = mpimg.imread('model.png')
plt.imshow(img)
plt.show()

The visualized model architecture looks as follows:

Fig 6.24: The model architecture

Now, we will validate the performance using a confusion matrix.

Confusion matrix 

We use a confusion matrix to validate the performance of a classification model on the test data for which the true values are known. 

To view the confusion matrix of the model, execute the following code:

y_pred=model.predict(x_test)
y_pred=np.argmax(y_pred, axis=1)
y_test=np.argmax(y_test, axis=1)
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
confusion_matrix

The confusion matrix of the model looks as follows:

Fig 6.25: The confusion matrix

You can create the confusion matrix in a more advanced way with the following code:

# Confussion matrix 
import itertools
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks...

The accuracy report

In this step, we will check the accuracy report of the model. We will get the following values.

Accuracy: The accuracy is the most important and popular metric for model validation. The ratio of the correctly predicted observation to the total observation is called the accuracy. In general, a high accuracy model is not always preferable, as the accuracy metric only works well with symmetric datasets where values of false positives and false negatives are almost the same.

Now, we will have a look at the formula of accuracy:

Here, we have the following: 

  • TP is true positive
  • TN is true negative
  • FP is false positive
  • TP is true positive

Precision: The ratio of correctly predicted positive observations (TP) to the total predicted positive observations (TP + FP) is called precision. This is the formula for precision:

Recall: The ratio of correctly predicted positive observations (TP) to all the observations in an actual class (TP + FN) is...

Summary

In this chapter, we learned about CNNs and the different ways of tweaking them, and also implemented a handwritten digit recognition model using Keras. We also learned about the hyperparameters for image-based problems and different accuracy metrics, such as accuracy, F1 score, precision, and recall. 

In the next chapter, we will implement an image classifier for traffic sign detection. With this project, we will be one step closer to seeing the real-time application of autonomous vehicles.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Applied Deep Learning and Computer Vision for Self-Driving Cars
Published in: Aug 2020Publisher: PacktISBN-13: 9781838646301
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

author image
Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu