Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Test Driven Machine Learning

You're reading from  Test Driven Machine Learning

Product type Book
Published in Nov 2015
Publisher
ISBN-13 9781784399085
Pages 190 pages
Edition 1st Edition
Languages

Table of Contents (16) Chapters

Test-Driven Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Introducing Test-Driven Machine Learning 2. Perceptively Testing a Perceptron 3. Exploring the Unknown with Multi-armed Bandits 4. Predicting Values with Regression 5. Making Decisions Black and White with Logistic Regression 6. You're So Naïve, Bayes 7. Optimizing by Choosing a New Algorithm 8. Exploring scikit-learn Test First 9. Bringing It All Together Index

Quantifying the classification models


To make sure that we're on the same page, let's start by looking at an example of an ROC curve and the AUC score. The scikit-learn documentation has an example code to build an ROC curve and calculate AUC, which you can find at http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html.

This ROC curve was built by running a classifier over the famous iris dataset. It shows us the true positive rate (y-axis) that we can get if we allow a given amount of false positive rate (x-axis). For example, if we were good with a 50 percent false positive rate, we would expect to see somewhere around a 90 percent true positive rate. Also, notice that the AUC percentage is 80 percent. Keeping in mind that a perfect classifier would score 100 percent, this seems pretty great. The dashed line in the chart represents a terrible and completely random (read non-predictive) model. An ideal model would be one that is pulled to the upper left-hand corner of the chart as much as possible. You can see in this chart that the model is somewhere between the two, which is pretty good. Whether or not that is acceptable depends on the problem that is being solved. How so?

Well, what if our classifier is attempting to identify the customers who would respond well to an advertisement? Every customer that we show it to who doesn't respond well to it has some chance of never doing business with us again. Let's say (though it's quite extreme) that the cost is so high that we need to eliminate all the false positives. Well, judging from our previous curve, this would mean we would only identify 10-15 percent of the true positives that exist. In this example, the little bit of performance boost is making more money, and so it's working quite well for our situation.

Imagine there's a one in 10,000 chance that if we incorrectly show a specific ad to someone, they'll sue us and it will cost us on average $25,000. Now, what does a good model look like? Here's a chart that I've created from the same previous ROC data, but with the following new set of parameters:

The maximum profit occurs right around a 1.9 percent false positive rate. As you can see, there is a huge drop off after that, even though this classifier works pretty well. For the purpose of this chapter, we can worry about writing the code for such thing as we progress. For now, it's fine to just have this gain chart. We'll get into guiding our process with these kind of results in future chapters.

You have been reading a chapter from
Test Driven Machine Learning
Published in: Nov 2015 Publisher: ISBN-13: 9781784399085
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}