Reader small image

You're reading from  50 Algorithms Every Programmer Should Know - Second Edition

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803247762
Edition2nd Edition
Right arrow
Author (1)
Imran Ahmad
Imran Ahmad
author image
Imran Ahmad

Imran Ahmad has been a part of cutting-edge research about algorithms and machine learning for many years. He completed his PhD in 2010, in which he proposed a new linear programming-based algorithm that can be used to optimally assign resources in a large-scale cloud computing environment. In 2017, Imran developed a real-time analytics framework named StreamSensing. He has since authored multiple research papers that use StreamSensing to process multimedia data for various machine learning algorithms. Imran is currently working at Advanced Analytics Solution Center (A2SC) at the Canadian Federal Government as a data scientist. He is using machine learning algorithms for critical use cases. Imran is a visiting professor at Carleton University, Ottawa. He has also been teaching for Google and Learning Tree for the last few years.
Read more about Imran Ahmad

Right arrow

Training a neural network

The process of building a neural networkneural network using a given dataset is called training a neural networkneural network. Let's look into the anatomy of a typical neural networkneural network. When we talk about training a neuralneural network, we are talking about calculating the best values for the weights. The training is done iteratively by using a set of examples in the form of training data. The examples in the training data have the expected values of the output for different combinations of input values. The training process for neural neural networks is different from the way traditional models are trained (which were discussed in Chapter 7Traditional Supervised Learning Algorithms).

Understanding the Anatomy of a neural network

Let's see what a neural networkneural network consists of:

  • Layers: Layers are the core building blocks of a neural networkneural network. Each layer is a data-processing module that acts as a filter. It takes one or more inputs, processes it in a certain way, and then produces one or more outputs. Each time data passes through a layer, it goes through a processing phase and shows patterns that are relevant to the business question we are trying to answer.
  • Loss function: A loss function provides the feedback signal that is used in the various iterations of the learning process. The loss function provides the deviation for a single example.
  • Cost function: The cost function is the loss function on a complete set of examples.
  • Optimizer: An optimizer determines how the feedback signal provided by the loss function will be interpreted.
  • Input data: Input data is the data that is used to train the neural networkneural...

Defining Gradient Descent

The purpose of training a neural networkneural network model is to find the right values for weights. We start training a neuralneural network with random or default values for the weights. Then, we iteratively use an optimizer algorithm, such as gradient descent, to change the weights in such a way that our predictions improve.The starting point of a gradient descent algorithm is the random values of weights that need to be optimized as we iterate through the algorithm. In each of the subsequent iterations, the algorithm proceeds by changing the values of the weights in such a way that the cost is minimized.The following diagram explains the logic of the gradient descent algorithm:

Figure 8.6: Gradient descent algorithm

In the preceding diagram, the input is the feature vector X. The actual value of the target variable is Y and the predicted value of the target variable is Y’. We determine the deviation of the actual value from the...

Activation Functions

An activation function formulates how the inputs to a particular neuron will be processed to generate an output.As shown in the following diagram, each of the neurons in a neural networkneural network has an activation function that determines how inputs will be processed:

Figure 8.9: Activation Function

In the preceding diagram, we can see that the results generated by an activation function are passed on to the output. The activation function sets the criteria that how the values of the inputs are supposed to be interpreted to generate an output.For exactly the same input values, different activation functions will produce different outputs. Understanding how to select the right activation function is important when using neural neural networks to solve problems.Let's now look into these activation functions one by one.

Step Function

The simplest possible activation function is the threshold function. The output of the threshold function is binary: 0 or 1. It will generate 1 as the output if any of the input is greater than 1. This can be explained in the following diagram:

Figure 8.10: Step Function

Note that as soon as there are any signs of life detected in the weighted sums of inputs, the output (y) becomes 1. This makes the threshold activation function very sensitive. It is quite vulnerable to being wrongly triggered by the slightest signal in the input due to a glitch or some noise.

Sigmoid

The sigmoid function can be thought of as an improvement of the threshold function. Here, we have control over the sensitivity of the activation function:

Figure 8.11: Sigmoid Activation Function

The sigmoid function, y, is defined as follows:

A picture containing text Description automatically generated

It can be implemented in Python as follows:

def sigmoidFunction(z): 
      return 1/ (1+np.exp(-z))

Note that by reducing the sensitivity of the activation function, we make glitches in the input less disruptive. Note that the output of the sigmoid activation function is still binary, that is, 0 or 1.

Rectified linear unit (ReLU)

The output for the first two activation functions presented in this chapter was binary. That means that they will take a set of input variables and convert them into binary outputs. ReLU is an activation function that takes a set of input variables as input and converts them into a single continuous output. In neuralneural networks, ReLU is the most popular activation function and is usually used in the hidden layers, where we do not want to convert continuous variables into category variables.The following diagram summarizes the ReLU activation function:

Figure 8.12: Rectified linear unit

Note that when x≤ 0, that means y = 0. This means that any signal from the input that is zero or less than zero is translated into a zero output:

Shape Description automatically generated with medium confidence
 for 
 for
Shape Description automatically generated with medium confidence

As soon as x becomes more than zero, it is x.The ReLU function is one of the most used activation functions in neural neural networks. It can...

Leaky ReLU

In ReLU, a negative value for x results in a zero value for y. It means that some information is lost in the process, which makes training cycles longer, especially at the start of training. The Leaky ReLU activation function resolves this issue. The following applies for Leaky ReLu:

Shape Description automatically generated with medium confidence
 ; for 
Shape Description automatically generated with medium confidence
 for

This is shown in the following diagram:

Figure 8.13: Leaky ReLu

Here, ß is a parameter with a value less than one.It can be implemented in Python as follows:

def leakyReLU(x,beta=0.01):
    if x<0:
        return (beta*x)    
    else:        
        return x

There are three ways of specifying the value for ß:

  • We can specify a default value of ß.
  • We can make ß a parameter in our neural networkneural network and we can let the neuralneural network decide the value (this is called parametric ReLU).
  • We can make ß a random value (this is called randomized ReLU).

Hyperbolic tangent (tanh)

The tanh function is similar to the sigmoid function, but it has the ability to give a negative signal as well. The following diagram illustrates this:

Figure 8.14: Hyperbolic tangent

The y function is as follows:

Text Description automatically generated

It can be implemented by the following Python code:

def tanh(x): 
    numerator = 1-np.exp(-2*x) 
    denominator = 1+np.exp(-2*x) 
    return numerator/denominator

Now let's look at the softmax function.

Softmax

Sometimes we need more than two levels for the output of the activation function. Softmax is an activation function that provides us with more than two levels for the output. It is best suited to multiclass classification problems. Let's assume that we have n classes. We have input values. The input values map the classes as follows:x = {x(1),x(2),....x(n)}Softmax operates on probability theory. The output probability of the eth class of the softmax is calculated as follows:

Text Description automatically generated with low confidence

For binary classifiers, the activation function in the final layer will be sigmoid, and for multiclass classifiers it will be softmax.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
50 Algorithms Every Programmer Should Know - Second Edition
Published in: Sep 2023Publisher: PacktISBN-13: 9781803247762
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Imran Ahmad

Imran Ahmad has been a part of cutting-edge research about algorithms and machine learning for many years. He completed his PhD in 2010, in which he proposed a new linear programming-based algorithm that can be used to optimally assign resources in a large-scale cloud computing environment. In 2017, Imran developed a real-time analytics framework named StreamSensing. He has since authored multiple research papers that use StreamSensing to process multimedia data for various machine learning algorithms. Imran is currently working at Advanced Analytics Solution Center (A2SC) at the Canadian Federal Government as a data scientist. He is using machine learning algorithms for critical use cases. Imran is a visiting professor at Carleton University, Ottawa. He has also been teaching for Google and Learning Tree for the last few years.
Read more about Imran Ahmad