Reader small image

You're reading from  Deep Learning for Beginners

Product typeBook
Published inSep 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838640859
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Pablo Rivas
Dr. Pablo Rivas
author image
Dr. Pablo Rivas

Dr. Pablo Rivas is an assistant professor of computer science at Baylor University in Texas. He worked in industry for a decade as a software engineer before becoming an academic. He is a senior member of the IEEE, ACM, and SIAM. He was formerly at NASA Goddard Space Flight Center performing research. He is an ally of women in technology, a deep learning evangelist, machine learning ethicist, and a proponent of the democratization of machine learning and artificial intelligence in general. He teaches machine learning and deep learning. Dr. Rivas is a published author and all his papers are related to machine learning, computer vision, and machine learning ethics. Dr. Rivas prefers Vim to Emacs and spaces to tabs.
Read more about Dr. Pablo Rivas

Right arrow
Training Multiple Layers of Neurons

Previously, in Chapter 6, Training a Single Neuron, we explored a model involving a single neuron and the concept of the perceptron. A limitation of the perceptron model is that, at best, it can only produce linear solutions on a multi-dimensional hyperplane. However, this limitation can be easily solved by using multiple neurons and multiple layers of neurons in order to produce highly complex non-linear solutions for separable and non-separable problems. This chapter introduces you to the first challenges of deep learning using the Multi-Layer Perceptron (MLP) algorithm, such as a gradient descent technique for error minimization, followed by hyperparameter optimization experiments to determine trustworthy accuracy.

The following topics will be covered in this chapter:

  • The MLP model
  • Minimizing the error
  • Finding the best hyperparameters
...

The MLP model

We have previously seen, in Chapter 5, Training a Single Neuron, that Rosenblatt's perceptron model is simple and powerful for some problems (Rosenblatt, F. 1958). However, for more complicated and highly non-linear problems, Rosenblatt did not give enough attention to his models that connected many more neurons in different architectures, including deeper models (Tappert, C. 2019).

Years later, in the 1990s, Prof. Geoffrey Hinton, the 2019 Turing Award winner, continued working to connect more neurons together since this is more brain-like than simple neurons (Hinton, G. 1990). Most people today know this type of approach as connectionist. The main idea is to connect neurons in different ways that will resemble brain connections. One of the first successful models was the MLP, which uses a supervised gradient descent-based learning algorithm that learns to approximate a function, , using labeled data, .

Figure 6.1 depicts an MLP with one layer of multiple neurons...

Minimizing the error

Learning from data using an MLP was one of the major problems since its conception. As we pointed out before, one of the major problems with neural networks was the computational tractability of deeper models, and the other was stable learning algorithms that would converge to a reasonable minimum. One of the major breakthroughs in machine learning, and what paved the way for deep learning, was the development of the learning algorithm based on backpropagation. Many scientists independently derived and applied forms of backpropagation in the 1960s; however, most of the credit has been given to Prof. G. E. Hinton and his group (Rumelhart, D. E., et.al. 1986). In the next few paragraphs, we will go over this algorithm, whose sole purpose is to minimize the error caused by incorrect predictions made during training.

To begin, we will describe the dataset, which is called spirals. This is a widely known benchmark dataset that has two classes that are separable, yet highly...

Finding the best hyperparameters

There is a simpler way of coding what we coded in the previous section using Keras. We can rely on the fact that the backprop is coded correctly and is improved for stability and there is a richer set of other features and algorithms that can improve the learning process. Before we begin the process of optimizing the set of hyperparameters of the MLP, we should indicate what would be the equivalent implementation using Keras. The following code should reproduce the same model, almost the same loss function, and almost the same backprop methodology:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

mlp = Sequential()
mlp.add(Dense(3, input_dim=2, activation='sigmoid'))
mlp.add(Dense(2, activation='sigmoid'))

mlp.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])

# This assumes that you still have X, y from earlier
# when we called...

Summary

This intermediate-introductory chapter showed the design of an MLP and the paradigms surrounding its functionality. We covered the theoretical framework behind its elements and we had a full discussion and treatment of the widely known backprop mechanism to perform gradient descent on a loss function. Understanding the backprop algorithm is key for further chapters since some models are designed specifically to overcome some potential difficulties with backprop. You should feel confident that what you have learned about backprop will serve you well in knowing what deep learning is all about. This backprop algorithm, among other things, is what makes deep learning an exciting area. Now, you should be able to understand and design your own MLP with different layers and different neurons. Furthermore, you should feel confident in changing some of its parameters, although we will cover more of this in the further reading.

Chapter 7, Autoencoders, will continue with an architecture...

Questions and answers

  1. Why is the MLP better than the perceptron model?

The larger number and layers of neurons give the MLP the advantage over the perceptron to model non-linear problems and solve much more complicated pattern recognition problems.

  1. Why is backpropagation so important to know about?

Because it is what makes neural networks learn in the era of big data.

  1. Does the MLP always converge?

Yes and no. It does always converge to a local minimum in terms of the loss function; however, it is not guaranteed to converge to a global minimum since, usually, most loss functions are non-convex and non-smooth.

  1. Why should we try to optimize the hyperparameters of our models?

Because anyone can train a simple neural network; however, not everyone knows what things to change to make it better. The success of your model depends heavily on you trying different things and proving to yourself (and others) that your model is the best that it can be. This is what will make you a better...

References

  • Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.
  • Tappert, C. C. (2019). Who is the Father of Deep Learning? Symposium on Artificial Intelligence.
  • Hinton, G. E. (1990). Connectionist learning procedures. Machine learning. Morgan Kaufmann, 555-610.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
  • Florez, O. U. (2017). One LEGO at a time: Explaining the Math of How Neural Networks Learn. Online: https://omar-florez.github.io/scratch_mlp/.
  • Amari, S. I. (1993). Backpropagation and stochastic gradient descent method. Neurocomputing, 5(4-5), 185-196.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning for Beginners
Published in: Sep 2020Publisher: PacktISBN-13: 9781838640859
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dr. Pablo Rivas

Dr. Pablo Rivas is an assistant professor of computer science at Baylor University in Texas. He worked in industry for a decade as a software engineer before becoming an academic. He is a senior member of the IEEE, ACM, and SIAM. He was formerly at NASA Goddard Space Flight Center performing research. He is an ally of women in technology, a deep learning evangelist, machine learning ethicist, and a proponent of the democratization of machine learning and artificial intelligence in general. He teaches machine learning and deep learning. Dr. Rivas is a published author and all his papers are related to machine learning, computer vision, and machine learning ethics. Dr. Rivas prefers Vim to Emacs and spaces to tabs.
Read more about Dr. Pablo Rivas