You're reading from Deep Learning for Beginners

Product type Book

Published in Sep 2020

Publisher Packt

ISBN-13 9781838640859

Pages 432 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Dr. Pablo Rivas

Training a Single Neuron

After revising the concepts around learning from data, we will now pay close attention to an algorithm that trains one of the most fundamental neural-based models: the perceptron. We will look at the steps required for the algorithm to function, and the stopping conditions. This chapter will present the perceptron model as the first model that represents a neuron, which aims to learn from data in a simple manner. The perceptron model is key to understanding basic and advanced neural models that learn from data. In this chapter, we will also cover the problems and considerations associated with non-linearly separable data.

Upon completion of the chapter, you should feel comfortable discussing the perceptron model, and applying its learning algorithm. You will be able to implement the algorithm over both linearly and non-linearly separable data.

Specifically...

The perceptron model

Back in Chapter 1, Introduction to Machine Learning, we briefly introduced the basic model of a neuron and the perceptron learning algorithm (PLA). Here, in this chapter, we will now revisit and expand the concept and show how that is coded in Python. We will begin with the basic definition.

The visual concept

The perceptron is an analogy of a human-inspired information processing unit, originally conceived by F. Rosenblatt and depicted in Figure 5.1 (Rosenblatt, F. (1958)). In the model, the input is represented with the vector , the activation of the neuron is given by the function , and the output is . The parameters of the neuron are and :

Figure 5.1 – The basic model of a perceptron

The trainable parameters of a perceptron are , and they are unknown. Thus, we can use input training data to determine these parameters using the PLA. From Figure 5.1, multiplies , then multiplies , and is multiplied by 1; all these products are added and then passed...

The perceptron learning algorithm

The perceptron learning algorithm (PLA) is the following:

Input: Binary class dataset

Initialize to zeros, and iteration counter
While there are any incorrectly classified examples:
Pick an incorrectly classified example, call it , whose true label is
Update as follows:
Increase iteration counter, , and repeat

Return:

Now, let's see how this takes form in Python.

PLA in Python

Here is an implementation in Python that we will discuss part by part, while some of it has already been discussed:

N = 100 # number of samples to generate
random.seed(a = 7) # add this to achieve for reproducibility

X, y = make_classification(n_samples=N, n_features=2, n_classes=2,
                           n_informative=2, n_redundant=0, n_repeated=0,
                           n_clusters_per_class=1, class_sep=1.2, 
                           random_state=5)

y[y==0] = -1

X_train = np.append(np.ones((N,1)), X, 1) # add a column of ones

# initialize the weights to zeros
w...

A perceptron over non-linearly separable data

As we have discussed before, a perceptron will find a solution in finite time if the data is separable. However, how many iterations it will take to find a solution depends on how close the groups are to each other in the feature space.

Convergence is when the learning algorithm finds a solution or reaches a steady state that is acceptable to the designer of the learning model.

The following paragraphs will deal with convergence on different types of data: linearly separable and non-linearly separable.

Convergence on linearly separable data

For the particular dataset that we have been studying in this chapter, the separation between the two groups of data is a parameter that can be varied (this is usually a problem with real data). The parameter is class_sep and can take on a real number; for example:

X, y = make_classification(..., class_sep=2.0, ...)

This allows us to study how many iterations it takes, on average, for the perceptron algorithm...

Summary

This chapter presented an overview of the classic perceptron model. We covered the theoretical model and its implementation in Python for both linearly and non-linearly separable datasets. At this point, you should feel confident that you know enough about the perceptron that you can implement it yourself. You should be able to recognize the perceptron model in the context of a neuron. Also, you should now be able to implement a pocket algorithm and early termination strategies in a perceptron, or any other learning algorithm in general.

Since the perceptron is the most essential element that paved the way for deep neural networks, after we have covered it here, the next step is to go to Chapter 6, Training Multiple Layers of Neurons. In that chapter, you will be exposed to the challenges of deep learning using the multi-layer perceptron algorithm, such as gradient descent techniques for error minimization, and hyperparameter optimization to achieve generalization. But before...

Questions and answers

What is the relationship between the separability of the data and the number of iterations of the PLA?

The number of iterations can grow exponentially as the data groups get close to one another.

Will the PLA always converge?

Not always, only for linearly separable data.

Can the PLA converge on non-linearly separable data?

No. However, you can find an acceptable solution by modifying it with the pocket algorithm, for example.

Why is the perceptron important?

Because it is one of the most fundamental learning strategies that has helped conceive the possibility of learning. Without the perceptron, it could have taken longer for the scientific community to realize the potential of computer-based automatic learning algorithms.

References

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
Muselli, M. (1997). On convergence properties of the pocket algorithm. IEEE Transactions on Neural Networks, 8(3), 623-629.