A Deeper Dive into Neural Networks

In this chapter, we will encounter more in-depth details of neural networks. We will start from building a perceptron. Moving on, we will learn about activation functions. And we will also be training our first perceptron.

In this chapter, we will cover the following topics:

From the biological to the artificial neuron – the perceptron
Building a perceptron
Learning through errors
Training a perceptron
Backpropagation
Scaling the perceptron
A single layered network

From the biological to the artificial neuron – the perceptron

Now that we have briefly familiarized ourselves with some insights on the nature of data processing, it's about time we see how the artificial cousins of our own biological neurons work themselves. We start with a creation of Frank Rosenblatt, dating back to the 1950s. He called this invention of his the Perceptron (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.335.3398&rep=rep1&type=pdf). Essentially, you can think of the perceptron as a single neuron in an artificial neural network (ANN). Understanding how a single perceptron propagates information forward will serve as an excellent stepping stone to understanding the more state-of-the-art networks that we will face in later chapters:

Building a perceptron

For now, we will define a perceptron using six specific mathematical representations that demonstrate its learning mechanism. These representations are the inputs, weights, bias term, summation, and the activation function. The output will be functionally elaborate upon here under.

Input

Remember how a biological neuron takes in electrical impulses from its dendrites? Well, the perceptron behaves in a similar fashion, yet it prefers to ingest numbers in lieu of electricity. Essentially, it takes in feature inputs, as shown in the preceding diagram. This particular perceptron only has three input channels, these being x₁, x₂, and x₃. These feature inputs (x₁, x₂, and x₃) can be any independent variable...

Learning through errors

All we essentially do to our input data is compute a dot product, add a bias term, pass it through a non-linear equation, and then compare our prediction to the real output value, taking a step in the direction of the actual output. This is the general architecture of an artificial neuron. You will soon see how this structure, configured repetitively, gives rise to some of the more complex neural networks around.

Exactly how we converge to ideal parametric values by taking a step in the right direction is through a method known as the backward propagation of errors, or backpropagation for short. But to propagate errors backwards, we need a metric to assess how well we are doing with respect to our goal. We define this metric as a loss, and calculate it using a loss function. This function attempts to incorporate the residual difference between what our...

Training a perceptron

So far, we have a clear grasp of how data actually propagates through our perceptron. We also briefly saw how the errors of our model can be propagated backwards. We use a loss function to compute a loss value at each training iteration. This loss value tells us how far our model's predictions lie from the actual ground truth. But what then?

Quantifying loss

Since the loss value gives us an indication of the difference between our predicted and actual outputs, it stands to reason that if our loss value is high, then there is a big difference between our model's predictions and the actual output. Conversely, a low loss value indicates that our model is closing the distance between the predicted...

Backpropagation

For the more mathematically oriented, you must be wondering how exactly we descend our gradient iteratively. Well, as you know, we start by initializing random weights to our model, feed in some data, compute dot products, and pass it through our activation function along with our bias to get a predicted output. We use this predicted output and the actual output to estimate the errors in our model's representations, using the loss function. Now here comes the calculus. What we can do now is differentiate our loss function, J(θ), with respect to the weights of our model (θ). This process essentially lets us compare how changes in our model's weights affect the changes in our model's loss. The result of this differentiation gives us the gradient of our J(θ) function at the current model weight (θ) along with the direction of...

Scaling the perceptron

So, we have seen so far how a single neuron may learn to represent a pattern, as it is trained. Now, let's say we want to leverage the learning mechanism of an additional neuron, in parallel. With two perceptron units in our model, each unit may learn to represent a different pattern in our data. Hence, if we wanted to scale the previous perceptron just a little bit by adding another neuron, we may get a structure with two fully connected layers of neurons, as shown in the following diagram:

Note here that the feature weights, as well as the additional fictional input we will have per neuron to represent our bias, have both disappeared. To simplify our representation, we have instead denoted both the scalar dot product, and our bias term, as a single symbol.

We choose to represent this mathematical function as the letter z. The value of z is then...

A single layered network

Right, now we have seen how to leverage two versions of our perception unit, in parallel, enabling each individual unit to learn a different underlying pattern that is possibly present in the data we feed it. We naturally want to connect these neurons to output neurons, which fire to indicate the presence of a specific output class. In our sunny-rainy day classification example, we have two output classes (sunny or rainy), hence a predictive network tasked to solve this problem will have two output neurons. These neurons will be supported by the learning of neurons from the previous layer, and ideally will represent features that are informative for predicting either a rainy or a sunny day. Mathematically speaking, all that is simply happening here is the forward propagation of our transformed input features, followed by the backward propagation of the...

Summary

Now that we have achieved a comprehensive understanding of neural learning systems, we can start getting our hands dirty. We will soon implement our first neural network, test it out for a classic classification task, and practically face many of the concepts we have covered here. In doing so, we will cover a detailed overview of the exact nature of loss optimization, and the evaluation metrics of neural networks.