2. Building Blocks of Neural Networks
This chapter introduces the main building blocks of neural networks and also explains the three main neural network architectures nowadays. Moreover, it explains the importance of data preparation before training any artificial intelligence model, and finally explains the process of solving a regression data problem. By the end of this chapter, you will have a firm grasp of the learning process of different network architectures and their different applications.
Introduction
In the previous chapter, it was explained why deep learning has become so popular nowadays, and PyTorch was introduced as one of the most popular libraries for developing deep learning solutions. Although the main syntax for building a neural network using PyTorch was explained, in this chapter, we will further explore the concept of neural networks.
Although neural network theory was developed several decades ago, since the concept evolved from the notion of the perceptron, different architectures have been created to solve different data problems in recent times. This is, in part, due to the different data formats that can be found in real-life data problems, such as text, audio, and images.
The purpose of this chapter is to dive into the topic of neural networks and their main advantages and disadvantages so that you can understand when and how to use them. Then, we will explain the building blocks of the most popular neural network architectures: artificial...
Introduction to Neural Networks
Neural networks learn from training data, rather than being programmed to solve a particular task by following a set of rules. This learning process can follow one of the following methodologies:
- Supervised learning: This is the simplest form of learning as it consists of a labeled dataset, where the neural network finds patterns that explain the relationship between the features and the target. The iterations during the learning process aim to minimize the difference between the predicted value and the ground truth. One example of this is classifying a plant based on the attributes of its leaves.
- Unsupervised learning: In contrast to the preceding methodology, unsupervised learning consists of training a model with unlabeled data (meaning that there is no target value). The purpose of this is to arrive at a better understanding of the input data. In general, networks take input data, encode it, and then reconstruct the content from the encoded...
Data Preparation
The first step in the development of any deep learning model – after gathering the data, of course – should be preparation of the data. This is crucial if we wish to understand the data at hand to outline the scope of the project correctly.
Many data scientists fail to do so, which results in models that perform poorly, and even models that are useless as they do not answer the data problem to begin with.
The process of preparing the data can be divided into three main tasks:
- Understanding the data and dealing with any potential issues
- Rescaling the features to make sure no bias is introduced by mistake
- Splitting the data to be able to measure performance accurately
All three tasks will be further explained in the next section.
Note
All of the tasks we explained previously are pretty much the same when applying any machine learning algorithm, considering that they refer to the techniques that are required to prepare...
Building a Deep Neural Network
Building a neural network, in general terms, can be achieved either on a very simple level using libraries such as scikit-learn (not suitable for deep learning), which perform all the math for you without much flexibility, or on a very complex level by coding every single step of the training process from scratch, or by using a more robust framework, which allows great flexibility.
PyTorch was built considering the input of many developers in the field and has the advantage of allowing both approximations in the same place. As we mentioned previously, it has a neural network module that was built to allow easy predefined implementations of simple architectures using the sequential container, while at the same time allowing for the creation of custom modules that introduce flexibility to the process of building very complex architectures.
In this section, we will discuss the use of the sequential container for developing deep neural networks in...
Summary
The theory that gave birth to neural networks was developed decades ago by Frank Rosenblatt. It started with the definition of the perceptron, a unit inspired by the human neuron, that takes data as input to perform a transformation on it. The theory behind the perceptron consisted of assigning weights to input data to perform a calculation so that the end result would be either one thing or the other, depending on the outcome.
The most widely known form of neural networks is the one that's created from a succession of perceptrons, stacked together in layers, where the output from one column of perceptrons (layer) is the input for the following one.
The typical learning process for a neural network was explained. Here, there are three main processes to consider: forward propagation, the calculation of the loss function, and backpropagation.
The end goal of this procedure is to minimize the loss function by updating the weights and biases that accompany each of...