Packt+ | Advance your knowledge in tech

You're reading from Test Driven Machine Learning

Product type Book

Published in Nov 2015

Publisher

ISBN-13 9781784399085

Pages 190 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Table of Contents (16) Chapters

Test-Driven Machine Learning

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Test-Driven Machine Learning

2. Perceptively Testing a Perceptron

3. Exploring the Unknown with Multi-armed Bandits

4. Predicting Values with Regression

5. Making Decisions Black and White with Logistic Regression

6. You're So Naïve, Bayes

7. Optimizing by Choosing a New Algorithm

8. Exploring scikit-learn Test First

9. Bringing It All Together

Index

Chapter 6. You're So Naïve, Bayes

We've all seen examples of using Naïve Bayes classifiers in a way of classifying text. The applications include spam detection, sentiment analysis, and more. In this chapter, we're going to take a road that is less traveled. We will build a Naïve Bayes classifier that can take in continuous inputs and classify them. Specifically, we'll build a Gaussian Naïve Bayes classifier to classify which state a person is from, which will be based on the person's height, weight, and BMI.

This chapter will work a bit differently from the previous ones. Here, we'll develop an N-class Gaussian Naïve Bayes classifier to fit our use case (the data at hand). In the next chapter, we'll pull in some of this data to train with, and then we'll analyze the quality of our model to see how we did it. In the previous chapters, we used generated data so that we could make sure that the classifiers built by us were operating according to their assumptions. In this chapter, we'll spend...

Gaussian classification by hand

Since the Gaussian Naïve Bayes classifier is less common, let's discuss it a bit more before diving in. The Gaussian Naïve Bayes algorithm works by taking in values that are continuous, and by assuming that they are all independent and that each variable follows a Gaussian (or Normal) distribution. It may not be obvious how a probability follows from this, so let's look at a concrete example.

Let's say that I give you five weights from the female test subjects and five weights from the male test subjects. Next, I want to give you a weight from a test subject of an unknown gender, and have you guess whether it's a man or woman. Using a Gaussian classifier, we can approach this problem by first defining an underlying Gaussian model for both, female and male observations (two models in total). A Gaussian model is specified using a mean and variance. Let's step through this with some numbers.

Let's assume that the following data is provided:

The weight of five random...

Beginning the development

We start with the standard simplistic tests that will serve to get the basic wiring up for our classifier. First, the test:

import NaiveBayes

def no_observations_test():
  classifier = NaiveBayes.Classifier()
  classification = classifier.classify(observation=23.2)
  assert classification is None, "Should not classify observations without training examples."

And then the code:

class Classifier:
  def classify(self, observation):
    pass

As the next step to approach a solution, let's try the case where we've only observed the data from a single class:

def given_an_observation_for_a_single_class_test():
  classifier = NaiveBayes.Classifier()
  classifier.train(classification='a class', observation=0)
  classification = classifier.classify(observation=23.2)
  assert classification == 'a class', "Should always classify as given class if there is only one."

A very simple solution is to just set a single classification that gets set every time we train something:

class Classifier...

Summary

In this chapter, we built up a Gaussian Naïve Bayes classifier, and ran into our first examples of truly necessary refactoring. We also saw how needing to make enormous changes in the code for a test is sometimes the result of trying to test too many concepts at once. We saw how backing up and rethinking test design can ultimately lead to a better and more elegantly designed piece of software as well.

In the next chapter, we'll apply this classifier to the real data, and see what it looks like to compare how different classifiers perform on the same data.

The rest of the chapter is locked