Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Test Driven Machine Learning

You're reading from  Test Driven Machine Learning

Product type Book
Published in Nov 2015
Publisher
ISBN-13 9781784399085
Pages 190 pages
Edition 1st Edition
Languages

Table of Contents (16) Chapters

Test-Driven Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Introducing Test-Driven Machine Learning 2. Perceptively Testing a Perceptron 3. Exploring the Unknown with Multi-armed Bandits 4. Predicting Values with Regression 5. Making Decisions Black and White with Logistic Regression 6. You're So Naïve, Bayes 7. Optimizing by Choosing a New Algorithm 8. Exploring scikit-learn Test First 9. Bringing It All Together Index

Chapter 6. You're So Naïve, Bayes

We've all seen examples of using Naïve Bayes classifiers in a way of classifying text. The applications include spam detection, sentiment analysis, and more. In this chapter, we're going to take a road that is less traveled. We will build a Naïve Bayes classifier that can take in continuous inputs and classify them. Specifically, we'll build a Gaussian Naïve Bayes classifier to classify which state a person is from, which will be based on the person's height, weight, and BMI.

This chapter will work a bit differently from the previous ones. Here, we'll develop an N-class Gaussian Naïve Bayes classifier to fit our use case (the data at hand). In the next chapter, we'll pull in some of this data to train with, and then we'll analyze the quality of our model to see how we did it. In the previous chapters, we used generated data so that we could make sure that the classifiers built by us were operating according to their assumptions. In this chapter, we'll spend...

Gaussian classification by hand


Since the Gaussian Naïve Bayes classifier is less common, let's discuss it a bit more before diving in. The Gaussian Naïve Bayes algorithm works by taking in values that are continuous, and by assuming that they are all independent and that each variable follows a Gaussian (or Normal) distribution. It may not be obvious how a probability follows from this, so let's look at a concrete example.

Let's say that I give you five weights from the female test subjects and five weights from the male test subjects. Next, I want to give you a weight from a test subject of an unknown gender, and have you guess whether it's a man or woman. Using a Gaussian classifier, we can approach this problem by first defining an underlying Gaussian model for both, female and male observations (two models in total). A Gaussian model is specified using a mean and variance. Let's step through this with some numbers.

Let's assume that the following data is provided:

  • The weight of five random...

Beginning the development


We start with the standard simplistic tests that will serve to get the basic wiring up for our classifier. First, the test:

import NaiveBayes

def no_observations_test():
  classifier = NaiveBayes.Classifier()
  classification = classifier.classify(observation=23.2)
  assert classification is None, "Should not classify observations without training examples."

And then the code:

class Classifier:
  def classify(self, observation):
    pass

As the next step to approach a solution, let's try the case where we've only observed the data from a single class:

def given_an_observation_for_a_single_class_test():
  classifier = NaiveBayes.Classifier()
  classifier.train(classification='a class', observation=0)
  classification = classifier.classify(observation=23.2)
  assert classification == 'a class', "Should always classify as given class if there is only one."

A very simple solution is to just set a single classification that gets set every time we train something:

class Classifier...

Summary


In this chapter, we built up a Gaussian Naïve Bayes classifier, and ran into our first examples of truly necessary refactoring. We also saw how needing to make enormous changes in the code for a test is sometimes the result of trying to test too many concepts at once. We saw how backing up and rethinking test design can ultimately lead to a better and more elegantly designed piece of software as well.

In the next chapter, we'll apply this classifier to the real data, and see what it looks like to compare how different classifiers perform on the same data.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Test Driven Machine Learning
Published in: Nov 2015 Publisher: ISBN-13: 9781784399085
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}