Reader small image

You're reading from  Advanced Deep Learning with Python

Product typeBook
Published inDec 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789956177
Edition1st Edition
Languages
Right arrow
Author (1)
Ivan Vasilev
Ivan Vasilev
author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev

Right arrow

The Nuts and Bolts of Neural Networks

In this chapter, we'll discuss some of the intricacies of neural networks (NNs)the cornerstone of deep learning (DL). We'll talk about their mathematical apparatus, structure, and training. Our main goal is to provide you with a systematic understanding of NNs. Often, we approach them from a computer science perspective—as a machine learning (ML) algorithm (or even a special entity) composed of a number of different steps/components. We gain our intuition by thinking in terms of neurons, layers, and so on (at least I did this when I first learned about this field). This is a perfectly valid way to do things and we can still do impressive things at this level of understanding. Perhaps this is not the correct approach, though.

NNs have solid mathematical foundations and if we approach them from this point of view, we...

The mathematical apparatus of NNs

In the next few sections, we'll discuss the mathematical branches related to NNs. Once we've done this, we'll connect them to NNs themselves.

Linear algebra

Linear algebra deals with linear equations such as and linear transformations (or linear functions) and their representations, such as matrices and vectors.

Linear algebra identifies the following mathematical objects:

  • Scalars: A single number.
  • Vectors: A one-dimensional array of numbers (or components). Each component of the array has an index. In literature, we will see vectors denoted either with a superscript arrow () or in bold (x). The following is an example of a vector:
Throughout this book, we'll mostly...

A short introduction to NNs

A NN is a function (let's denote it with f) that tries to approximate another target function, g. We can describe this relationship with the following equation:

Here, x is the input data and θ are the NN parameters (weights). The goal is to find such θ parameters with the best approximate, g. This generic definition applies for both regression (approximating the exact value of g) and classification (assigning the input to one of multiple possible classes) tasks. Alternatively, the NN function can be denoted as .

We'll start our discussion from the smallest building block of the NNthe neuron.

Neurons

The preceding definition is a bird's-eye view of a NN. Now, let...

Training NNs

In this section, we'll define training a NN as the process of adjusting its parameters (weights) θ in a way that minimizes the cost function J(θ). The cost function is some performance measurement over a training set that consists of multiple samples, represented as vectors. Each vector has an associated label (supervised learning). Most commonly, the cost function measures the difference between the network output and the label.

We'll start this section with a short recap of the gradient descent optimization algorithm. If you're already familiar with it, you can skip this.

Gradient descent

For the purposes of this section, we'll use a NN with a single regression output and mean...

Summary

We started this chapter with a tutorial on the mathematical apparatus that forms the foundation of NNs. Then, we recapped on NNs and their architecture. Along the way, we tried to explicitly connect the mathematical concepts with the various components of the NNs. We paid special attention to the various types of activation functions. Finally, we took a comprehensive look at the NN training process. We discussed gradient descent, cost functions, backpropagation, weights initialization, and SGD optimization techniques.

In the next chapter, we'll discuss the intricacies of convolutional networks and their applications in the computer vision domain.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Deep Learning with Python
Published in: Dec 2019Publisher: PacktISBN-13: 9781789956177
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev