Reader small image

You're reading from  Python Machine Learning

Product typeBook
Published inSep 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781783555130
Edition1st Edition
Languages
Right arrow
Author (1)
Sebastian Raschka
Sebastian Raschka
author image
Sebastian Raschka

Sebastian Raschka is an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning research. As Lead AI Educator at Grid AI, Sebastian plans to continue following his passion for helping people get into machine learning and artificial intelligence.
Read more about Sebastian Raschka

Right arrow

Chapter 12. Training Artificial Neural Networks for Image Recognition

As you may know, deep learning is getting a lot of press and is without any doubt the hottest topic in the machine learning field. Deep learning can be understood as a set of algorithms that were developed to train artificial neural networks with many layers most efficiently. In this chapter, you will learn the basic concepts of artificial neural networks so that you will be well equipped to further explore the most exciting areas of research in the machine learning field, as well as the advanced Python-based deep learning libraries that are currently being developed.

The topics that we will cover are as follows:

  • Getting a conceptual understanding of multi-layer neural networks

  • Training neural networks for image classification

  • Implementing the powerful backpropagation algorithm

  • Debugging neural network implementations

Modeling complex functions with artificial neural networks


At the beginning of this book, we started our journey through machine learning algorithms with artificial neurons in Chapter 2, Training Machine Learning Algorithms for Classification. Artificial neurons represent the building blocks of the multi-layer artificial neural networks that we are going to discuss in this chapter. The basic concept behind artificial neural networks was built upon hypotheses and models of how the human brain works to solve complex problem tasks. Although artificial neural networks have gained a lot of popularity in recent years, early studies of neural networks go back to the 1940s when Warren McCulloch and Walter Pitt first described how neurons could work. However, in the decades that followed the first implementation of the McCulloch-Pitt neuron model, Rosenblatt's perceptron in the 1950s, many researchers and machine learning practitioners slowly began to lose interest in neural networks since no one...

Classifying handwritten digits


In the previous section, we covered a lot of the theory around neural networks, which can be a little bit overwhelming if you are new to this topic. Before we continue with the discussion of the algorithm for learning the weights of the MLP model, backpropagation, let's take a short break from the theory and see a neural network in action.

Note

Neural network theory can be quite complex, thus I want to recommend two additional resources that cover some of the concepts that we discuss in this chapter in more detail:

T. Hastie, J. Friedman, and R. Tibshirani. The Elements of Statistical Learning, Volume 2. Springer, 2009.

C. M. Bishop et al. Pattern Recognition and Machine Learning, Volume 1. Springer New York, 2006.

In this section, we will train our first multi-layer neural network to classify handwritten digits from the popular MNIST dataset (short for Mixed National Institute of Standards and Technology database) that has been constructed by Yann LeCun et al...

Training an artificial neural network


Now that we have seen a neural network in action and have gained a basic understanding of how it works by looking over the code, let's dig a little bit deeper into some of the concepts, such as the logistic cost function and the backpropagation algorithm that we implemented to learn the weights.

Computing the logistic cost function

The logistic cost function that we implemented as the _get_cost method is actually pretty simple to follow since it is the same cost function that we described in the logistic regression section in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn.

Here, is the sigmoid activation of the th unit in one of the layers which we compute in the forward propagation step:

Now, let's add a regularization term, which allows us to reduce the degree of overfitting. As you will recall from earlier chapters, the L2 and L1 regularization terms are defined as follows (remember that we don't regularize the bias units):

Although...

Developing your intuition for backpropagation


Although backpropagation was rediscovered and popularized almost 30 years ago, it still remains one of the most widely used algorithms to train artificial neural networks very efficiently. In this section, we'll see a more intuitive summary and the bigger picture of how this fascinating algorithm works.

In essence, backpropagation is just a very computationally efficient approach to compute the derivatives of a complex cost function. Our goal is to use those derivatives to learn the weight coefficients for parameterizing a multi-layer artificial neural network. The challenge in the parameterization of neural networks is that we are typically dealing with a very large number of weight coefficients in a high-dimensional feature space. In contrast to other cost functions that we have seen in previous chapters, the error surface of a neural network cost function is not convex or smooth. There are many bumps in this high-dimensional cost surface (local...

Debugging neural networks with gradient checking


Implementations of artificial neural networks can be quite complex, and it is always a good idea to manually check that we have implemented backpropagation correctly. In this section, we will talk about a simple procedure called gradient checking, which is essentially a comparison between our analytical gradients in the network and numerical gradients. Gradient checking is not specific to feedforward neural networks but can be applied to any other neural network architecture that uses gradient-based optimization. Even if you are planning to implement more trivial algorithms using gradient-based optimization, such as linear regression, logistic regression, and support vector machines, it is generally not a bad idea to check if the gradients are computed correctly.

In the previous sections, we defined a cost function where is the matrix of the weight coefficients of an artificial network. Note that is—roughly speaking—a "stacked" matrix consisting...

Convergence in neural networks


You might be wondering why we did not use regular gradient descent but mini-batch learning to train our neural network for the handwritten digit classification. You may recall our discussion on stochastic gradient descent that we used to implement online learning. In online learning, we compute the gradient based on a single training example at a time to perform the weight update. Although this is a stochastic approach, it often leads to very accurate solutions with a much faster convergence than regular gradient descent. Mini-batch learning is a special form of stochastic gradient descent where we compute the gradient based on a subset of the training samples with . Mini-batch learning has the advantage over online learning that we can make use of our vectorized implementations to improve computational efficiency. However, we can update the weights much faster than in regular gradient descent. Intuitively, you can think of mini-batch learning as predicting...

Other neural network architectures


In this chapter, we discussed one of the most popular feedforward neural network representations, the multi-layer perceptron. Neural networks are currently one of the most active research topics in the machine learning field, and there are many other neural network architectures that are well beyond the scope of this book. If you are interested in learning more about neural networks and algorithms for deep learning, I recommend reading the introduction and overview; Y. Bengio. Learning Deep Architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009. Yoshua Bengio's book is currently freely available at http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf.

Although neural networks really are a topic for another book, let's take at least a brief look at two other popular architectures, convolutional neural networks and recurrent neural networks.

Convolutional Neural Networks

Convolutional Neural Networks (CNNs or ConvNets) gained...

A few last words about neural network implementation


You might be wondering why we went through all of this theory just to implement a simple multi-layer artificial network that can classify handwritten digits instead of using an open source Python machine learning library. One reason is that at the time of writing this book, scikit-learn does not have an MLP implementation. More importantly, we (machine learning practitioners) should have at least a basic understanding of the algorithms that we are using in order to apply machine learning techniques appropriately and successfully.

Now that we know how feedforward neural networks work, we are ready to explore more sophisticated Python libraries built on top of NumPy such as Theano http://deeplearning.net/software/theano/), which allows us to construct neural networks more efficiently. We will see this in Chapter 13, Parallelizing Neural Network Training with Theano. Over the last couple of years, Theano has gained a lot of popularity among...

Summary


In this chapter, you have learned about the most important concepts behind multi-layer artificial neural networks, which are currently the hottest topic in machine learning research. In Chapter 2, Training Machine Learning Algorithms for Classification, we started our journey with simple single-layer neural network structures and now we have connected multiple neurons to a powerful neural network architecture to solve complex problems such as handwritten digit recognition. We demystified the popular backpropagation algorithm, which is one of the building blocks of many neural network models that are used in deep learning. After learning about the backpropagation algorithm, we were able to update the weights of such a complex neural network. We also added useful modifications such as mini-batch learning and an adaptive learning rate that allows us to train a neural network more efficiently.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Machine Learning
Published in: Sep 2015Publisher: PacktISBN-13: 9781783555130
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Sebastian Raschka

Sebastian Raschka is an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning research. As Lead AI Educator at Grid AI, Sebastian plans to continue following his passion for helping people get into machine learning and artificial intelligence.
Read more about Sebastian Raschka