Packt+ | Advance your knowledge in tech

You're reading from Test Driven Machine Learning

Product type Book

Published in Nov 2015

Publisher

ISBN-13 9781784399085

Pages 190 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Table of Contents (16) Chapters

Test-Driven Machine Learning

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Test-Driven Machine Learning

2. Perceptively Testing a Perceptron

3. Exploring the Unknown with Multi-armed Bandits

4. Predicting Values with Regression

5. Making Decisions Black and White with Logistic Regression

6. You're So Naïve, Bayes

7. Optimizing by Choosing a New Algorithm

8. Exploring scikit-learn Test First

9. Bringing It All Together

Index

The TDD cycle

The TDD cycle consists of writing a small function in the code that attempts to do something that we haven't programmed yet. These small test methods will have three main sections: the first section is where we set up our objects or test data; the second section is where we invoke the code that we're testing; and the last section is where we validate that what happened is what we thought would happen. You will write all sorts of lazy code to get your tests to pass. If you are doing it right, then someone who is watching you should be appalled at your laziness and tiny steps. After the test goes green, you have an opportunity to refactor your code to your heart's content. In this context, "refactor" refers to changing how your code is written, but not changing how it behaves.

Let's examine more deeply the three steps of TDD: Red, Green, and Refactor.

Red

First, create a failing test. Of course, this implies that you know what failure looks like in order to write the test. At the highest level in machine learning, this might be a baseline test where baseline is a "better than random" test. It might even be "predicts random things", or even simpler "always predicts the same thing". Is this terrible? Perhaps, it is to some who are enamored with the elegance and artistic beauty of his/her code. Is it a good place to start though? Absolutely. A common issue that I have seen in machine learning is spending so much time up front, implementing The One True Algorithm that hardly anything ever gets done. Getting to outperform pure randomness, though, is a useful change that can start making your business money as soon as it's deployed.

Green

After you have established a failing test, you can start working to get it green. If you start with a very high-level test, you may find that it helps to conceptually break that test up into multiple failing tests that are lower-level concerns. I'll dive deeper into this later on in this chapter, but for now, just know that if you want to get your test passing as soon as possible, lie, cheat, and steal to get there. I promise that cheating actually makes your software's test suite that much stronger. Resist the urge to write the software in an ideal fashion. Just slap something together. You will be able to fix the issues in the next step.

Refactor

You got your test to pass through all manner of hackery. Now you get to refactor your code. Note that it is not to be interpreted loosely. Refactor specifically means to change your software without affecting its behavior. If you add the if clauses, or any other special handling, you are no longer refactoring. Next, you write the software without tests. One way you will know for sure that you are no longer refactoring is if you've broken previously passing tests. If this happens, we back up our changes until our tests pass again. It may not be obvious, but this isn't all that it takes for you to know that you haven't changed behavior. Read Refactoring: Improving the Design of Existing Code, Martin Fowler for you to understand how much you should really care for refactoring. In his illustration in this book, refactoring code becomes a set of forms and movements, not unlike karate katas.

This is a lot of general theory, but what does a test actually look like? How does this process flow in a real problem?

You're reading from Test Driven Machine Learning

Table of Contents (16) Chapters

The TDD cycle

Red

Green

Refactor

Personalised recommendations for you