In this chapter, you will learn about a topic that has changed the way we think about autonomous driving: Artificial Neural Networks (ANNs). Throughout this chapter, you will learn how these algorithms can be used to build a self-driving car perception stack, and you'll learn about the different components needed to design and train a deep neural network. This chapter will teach you everything you need to know about ANNs. You will also learn about the building blocks of feedforward neural networks, a very useful basic type of ANN. Specifically, we'll look at the hidden layers of a feedforward neural network. These hidden layers are important as they differentiate the mode of action of neural networks from the rest of the Machine Learning (ML) algorithms. We'll begin by looking at the mathematical definition of feedforward...
You're reading from Applied Deep Learning and Computer Vision for Self-Driving Cars
Diving deep into neural networks
Deep learning is a sub-field of ML that is based on ANNs (see Fig 2.1). Deep learning mimics the human brain and is inspired by the structure and function of the brain. The concept of deep learning is not new and has existed for a number of years. The reason for the popularity and success of deep learning in recent years is due to high powered processing units, such as GPUs, and the presence of enormous amounts of data. One of the reasons for deep neural networks (DNNs) performing better is the complex relationships among features and high-dimensional data:
One of the great things about deep learning is that it eliminates human input. It replaces the costly and inefficient effort of human beings and automates most of the extraction process from features and raw data so that it doesn't require human involvement. Before, we used to extract features ourselves to make ML algorithms...
Introduction to neurons
In this section, we will discuss neurons, which are the basic building blocks of ANNs. In the following photograph, we can see actual real-life neurons as observed through a microscope:
You can find this photograph at https://commons.wikimedia.org/wiki/Neuron#/media/File:Pyramidal_hippocampal_neuron_40x.jpg.
The question now is how can we recreate neurons in ML? We need to create them since the whole purpose of deep learning is to mimic the human brain, one of the most powerful tools on the planet. So, the first step toward creating an ANN is to recreate a neuron.
Before creating a neuron in ML, we will examine the depiction of neurons created by Spanish neuroscientist Santiago Ramon y Cajal in 1899.
Nowadays, we have advanced technology that...
Understanding neurons and perceptrons
As discussed in the previous section, Introduction to neurons, before, ANNs had a basis in biology, and we mimic biological neurons with artificial neurons that are known as perceptrons. The perceptron is a mathematical model of a biological neuron. Later in this section, we will see how we can mimic biological neurons with artificial neurons.
As we know, the biological neuron is a brain cell. The body of the neuron has dendrites. When an electrical signal is passed from the dendrites to the body cell of the neuron, a single output or a single electrical signal comes out through an axon, and then it connects to some other neuron, as shown in the diagram of the generic neurotransmitter system that you can find in the link provided in the Introduction to neurons section. That is the basic idea we have: lots of inputs of electrical signals go through the dendrites, into the body, and then through...
The workings of ANNs
We have seen the concept of how a single neuron or perceptron works; so now, let's expand the concept to the idea of deep learning. The following diagram shows us what multiple perceptrons look like:
In the preceding diagram, we can see various layers of single perceptrons connected to each other through their inputs and outputs. The input layer is violet, the hidden layers are blue and green, and the output layer of the network is represented in red.
Input layers are real values from the data, so they take in actual data as their input. The next layers are the hidden layers, which are between the input and output layers. If three or more hidden layers are present, then it's considered a deep neural network. The final layer is the output layer, where we have some sort of final estimation of whatever the output that we are trying to estimate is. As we progress through more layers, the level of...
Understanding activation functions
Activation functions are so important to neural networks as they introduce non-linearity to a network. Deep learning consists of multiple non-linear transformations, and activation functions are the tools for non-linear transformation. Hence, activation functions are applied before sending an input signal to the next layer of neural networks. Due to activation functions, a neural network has the power to learn complex features.
Deep learning has many activation functions:
- The threshold function
- The sigmoid function
- The rectifier function
- The hyperbolic tangent function
- The cost function
In the next section, we will start with one of the most important activation functions, called the threshold activation function.
The threshold function
The threshold function can be seen in the following diagram:
On the x axis, we have the weighted sum of the input, and on the y axis, we have the threshold values from 0 to 1. The threshold function is very simple: if the value is less than 0, then the threshold will be 0 and if the value is more than 0, then the threshold will be 1. This works as a yes-or-no function.
The sigmoid function
The sigmoid function is a very interesting type of function; we can see it in the following diagram:
The sigmoid function is nothing but a logistic function. In this function, anything below 0 will be set to 0. This function is often used in the output layer, especially when you're trying to find the predictive probability.
The rectifier linear function
The Rectifier Linear (ReLU) function is one of the most popular functions in the field of ANNs. If the value is less than or equal to 0, then the value of x is set to 0, and then from there, it gradually progresses as the input value increases. We can observe this in the following diagram:
In the next section, we will learn about the hyperbolic tangent activation function.
The hyperbolic tangent activation function
Finally, we have another function, called the Hyperbolic Tangent Activation (tanh) function, which looks as follows:
The tanh function is very similar to the sigmoid function; the range of a tanh function is (-1,1). Tanh functions are also S-shaped, like sigmoid functions. The advantage of the tanh function is that a positive will be mapped as strongly positive, a negative will be mapped as strongly negative, and 0 will be mapped to 0, as shown in Fig 2.16.
In the next section of this chapter, we will learn about the cost function.
The cost function of neural networks
We will now explore how can we evaluate the performance of a neural network by using the cost function. We will use it to measure how far we are from the expected value. We are going to use the following notation and variables:
- Variable Y to represent the true value
- Variable a to represent the neuron prediction
In terms of weight and biases, the formula is as follows:
We pass z, which is the input (X) times the weight (X) added to the bias (b), into the activation function of .
There are many types of cost functions, but we are just going to discuss two of them:
- The quadratic cost function
- The cross-entropy function
The first cost function we are going to discuss is the quadratic cost function, which is represented with the following formula:
In the preceding formula, we can see that when the error is high, which means the actual value (Y) is less than the predictive value (a), then the value of the cost function...
Optimizers
Optimizers define how a neural network learns. They define the value of parameters during the training such that the loss function is at its lowest.
Gradient descent is an optimization algorithm for finding the minima of a function or the minimum value of a cost function. This is useful to us as we want to minimize the cost function. So, to find the local minimum, we take steps proportional to the negative of the gradient.
Let's go through a very simple example in one dimension, shown in the following plot:
On the y axis, we have the cost (the result of the cost function), and on the x axis, we have the particular weight we are trying to choose (we chose the random weight). The weight minimizes the cost function and we can see that, basically, the parameter value is at the bottom of the parabola. We have to minimize the value of the cost function to the minimum value. Finding the minimum is really...
Understanding hyperparameters
Hyperparameters serve a similar purpose to the various tone knobs on a guitar that are used to get the best sound. They are settings that you can tune to control the behavior of an ML algorithm.
A vital aspect of any deep learning solution is the selection of hyperparameters. Most deep learning models have specific hyperparameters that control various aspects of the model, including memory or the execution cost. However, it is possible to define additional hyperparameters to help an algorithm adapt to a scenario or problem statement. To get the maximum performance of a particular model, data science practitioners typically spend lots of time tuning hyperparameters as they play such an important role in deep learning model development.
Hyperparameters can be broadly classified into two categories:
- Model training-specific hyperparameters
- Network architecture-specific hyperparameters
In the following sections, we will cover model training-specific hyperparameters...
Model training-specific hyperparameters
Model training-specific hyperparameters play an important role in model training. These are hyperparameters that live outside the model but have a direct influence on it. We will discuss the following hyperparameters:
- Learning rate
- Batch size
- Number of epochs
Let's start with the learning rate.
Learning rate
The learning rate is the mother of all hyperparameters and quantifies the model's learning progress in a way that can be used to optimize its capacity.
A too-low learning rate would increase the training time of the model as it would take longer to incrementally change the weights of the network to reach an optimal state. On the other hand, although a large learning rate helps the model adjust to the data quickly, it causes the model to overshoot the minima. A good starting value for the learning rate for most models would be 0.001; in the following diagram, you can see that a low learning rate requires many updates before reaching the minimum point:
However, an optimal learning rate swiftly reaches the minimum point. It requires less of an update before reaching near minima. Here, we can see a diagram with a decent learning rate:
A high learning rate causes drastic updates that lead...
Batch size
Another non-trivial hyperparameter that has a huge influence on the training accuracy, time, and resource requirements is batch size. Basically, batch size determines the number of data points that are sent to the ML algorithm in a single iteration during training.
Although having a very large batch size is beneficial for huge computational boosts, in practice, it has been observed that there is a significant degradation in the quality of the model, as measured by its ability to generalize. Batch size also comes at the expense of needing more memory for the training process.
Although a smaller batch size increases the training time, it almost always yields a better model than when using a larger batch size. This can be attributed to the fact that smaller batch sizes introduce more noise in gradient estimations, which helps them converge to flat minimizers. However, the downside of using a small batch size is that training times are increased.
Number of epochs
An epoch is the number of cycles for which a model is trained. One epoch is when a whole dataset is passed forward and backward only once through the neural network. We can also say that an epoch is an easy way to track the number of cycles, while the training or validation error continues to go on. Since one epoch is too large to feed at once to the machine, we divide it into many smaller batches.
One of the techniques to do this is to use the early stopping Keras callback, which stops the training process if the training/validation error has not improved in the past 10 to 20 epochs.
Network architecture-specific hyperparameters
The hyperparameters that directly deal with the architecture of the deep learning model are called network architecture-specific hyperparameters. The different types of network-specific hyperparameters are as follows:
- Number of hidden layers
- Regularization
- Activation function as hyperparameters
In the following section, we will see how network architecture-specific hyperparameters work.
Regularization
Regularization is a hyperparameter that allows slight changes to the learning algorithm so that the model becomes more generalized. This also improves the performance of the model on the unseen data.
In ML, regularization penalizes the coefficients. In deep learning, regularization penalizes the weight matrices of the nodes.
We are going to discuss two types of regularization, as follows:
- L1 and L2 regularization
- Dropout
We will start with L1 and L2 regularization.
L1 and L2 regularization
The most common types of regularization are L1 and L2. We change the overall cost function by adding another term called regularization. The values of weight matrices decrease due to the addition of this regularization because it assumes that a neural network with smaller weight matrices leads to simpler models.
Regularization is different in L1 and L2. The formula for L1 regularization is as follows:
In the preceding formula, regularization is represented by lambda (λ). Here, we penalize the absolute weight.
The formula for L2 regularization is as follows:
In the preceding formula, L2 regularization is represented by lambda (λ). It is also called weight decay as it forces the weights to decay close to 0.
Dropout
Dropout is a regularization technique that is used to improve the generalizing power of a network and prevent it from overfitting. Generally, a dropout value of 0.2 to 0.5 is used, with 0.2 being a good starting point. In general, we have to select multiple values and check the performance of the model.
The likelihood of a dropout that has a value that is too low has a negligible impact. However, if the value is too high for the network, then the network under-learns the features during model training. If dropout is used on a larger and wider network, then you are likely to get better performance, giving the model a greater opportunity to learn independent representations.
An example of dropout can be seen as follows, showing how we are going to drop a few of the neurons from the network:
In the next section, we will learn about activation functions as hyperparameters.
Activation functions as hyperparameters
Activation functions, which are less commonly known as transfer functions, are used to enable the model to learn nonlinear prediction boundaries. Different activation functions behave differently and are carefully chosen based on the deep learning task at hand. We have already discussed different types of activation in an earlier section of this chapter, Understanding activation functions.
In the next section, we will learn about the popular deep learning APIs—TensorFlow and Keras.
TensorFlow versus Keras
Primarily, there are two levels of abstraction for deep learning frameworks:
- Firstly, there is the lower level, where frameworks such as TensorFlow, Theano, and PyTorch sit. It is at this level where neural network elements such as convolutions and other generalized matrix operations are carried out.
- Then, there is a higher level, where frameworks such as Keras are present. Here, primitives from the lower levels are utilized to create neural network layers and models. User-friendly APIs for training and saving models are also implemented here.
Since they are present on different levels of abstraction, you cannot compare Keras and TensorFlow. TensorFlow, while being used for deep learning, is not a dedicated deep learning library and is used for a wide array of other applications besides deep learning. Keras, however, is a library developed from the ground up specifically for deep learning. It has very well-designed APIs...
Summary
In this chapter, we learned how to convert biological neurons into artificial neurons, how ANNs work, and about various hyperparameters. We also covered an overview of deep learning APIs—TensorFlow and Keras. This chapter has provided a foundation for deep learning. Now, you are ready to start implementing a deep learning model, which is the next step toward designing your implementation of a deep learning model for autonomous cars.
In the next chapter, we are going to implement a deep learning model using Keras.