Reader small image

You're reading from  Deep Learning Quick Reference

Product typeBook
Published inMar 2018
Reading LevelExpert
PublisherPackt
ISBN-139781788837996
Edition1st Edition
Languages
Right arrow
Author (1)
Mike Bernico
Mike Bernico
author image
Mike Bernico

Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Read more about Mike Bernico

Right arrow

Using Keras to Solve Multiclass Classification Problems

In this chapter, we will use Keras and TensorFlow to take on a 10-class multiclass classification problem with lots of independent variables. As before, we will talk about the pros and cons of using deep learning for this problem; however, you won't find many cons. Lastly, we will spend a good amount of time talking about methods to control overfitting.

We will cover the following topics in this chapter:

  • Multiclass classification and deep neural networks
  • Case study – handwritten digit classification
  • Building a multiclass classifier in Keras
  • Controlling variance with dropout
  • Controlling variance with regularization

Multiclass classification and deep neural networks

Here it is! We've finally gotten to the fun stuff! In this chapter, we will be creating a deep neural network that can classify an observation into multiple classes, and this is one of those places where neural networks really do well. Let's talk just a bit more about the benefit of deep neural networks for this class of problems.

Just so we're all talking about the same thing, let's define multiclass classification before we begin. Imagine we had a classifier that had, as inputs, the weights of various fruits and would predict the fruit given the weight. The output might be exactly one class in a set of classes (apple, banana, mango, and so on). That's multiclass classification, not to be confused with multilabel, which is the situation where a model might predict whether or not a set of labels will apply...

Case study - handwritten digit classification

We will be using a multiclass classification network to recognize the corresponding class of a handwritten digit. As before, you can find the complete code for this chapter in the book's Git repository, under Chapter05, if you'd like to follow along.

Problem definition

The MNIST dataset has become an almost canonical neural network dataset. This dataset consists of images of 60,000 handwritten digits, belonging to 10 classes representing their respective digit (0,1,2...9). Because this dataset has become so common, many deep learning frameworks come with an MNIST loading method built into the API. Both TensorFlow and Keras have one, and we will be using the Keras MNIST...

Building a multiclass classifier in Keras

Since we now have a well-defined problem, we can start to code it. As we mentioned earlier, we have to make a few transformations to our inputs and outputs this time. I'll show you those here as we're building the network.

Loading MNIST

Luckily for us, an MNIST loading function that retrieves the MNIST data and loads it for us is built right into Keras. All we need to do is import keras.datasets.mnist and use the load_data() method, as shown in the following code:

(train_X, train_y), (test_X, test_y) = mnist.load_data()

The shape of train_X is 50,000 x 28 x 28. As we explained in the Model inputs and outputs section, we will need to flatten the 28x28 matrix into a 784 element...

Controlling variance with dropout

One really great way to reduce overfitting in deep neural networks is to employ a technique called dropout. Dropout does exactly what it says, it drop neurons out of a hidden layer. Here's how it works.

Through every minibatch, we will randomly choose to turn off nodes in each hidden layer. Imagine we had some hidden layer where we had implemented dropout, and we chose the drop probability to be 0.5. That means, for every mini batch, for every neuron, we flip a coin to see whether we use that neuron. In doing so, you'd probably randomly turn off about half of the neurons in that hidden layer:

If we do this over and over again, it's like we're training many smaller networks. The model weights remain relatively smaller, and each smaller network is less likely to overfit the data. It also forces each neuron to be less dependent...

Controlling variance with regularization

Regularization is another way to control overfitting, that penalizes individual weights in the model as they grow larger. If you're familiar with linear models such as linear and logistic regression, it's exactly the same technique applied at the neuron level. Two flavors of regularization, called L1 and L2, can be used to regularize neural networks. However, because it is more computationally efficient L2 regularization is almost always used in neural networks.

Quickly, we need to first regularize our cost function. If we imagine C0, categorical cross-entropy, as the original cost function, then the regularized cost function would be as follows:

Here, ; is a regularization parameter that can be increased or decreased to change the amount of regularization applied. This regularization parameter penalizes big values for weights...

Summary

In this chapter, we've really started to see just how powerful a deep neural network can be when doing multiclass classification. We covered the softmax function in detail and then we built and trained a network to classify handwritten digits into their 10 respective classes.

Finally, when we noticed that our model was overfitting, we attempted to use both dropout and L2 regularization to reduce the variance of the model.

By now, you've seen that deep neural networks require lots of choices, choices about architecture, learning rate, and even regularization rates. We will spend the next chapter learning how to optimize these choices.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning Quick Reference
Published in: Mar 2018Publisher: PacktISBN-13: 9781788837996
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Mike Bernico

Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Read more about Mike Bernico