Python: Unit Testing with Doctest

Exclusive offer: get 50% off this eBook here
Python Testing: Beginner's Guide

Python Testing: Beginner's Guide — Save 50%

An easy and convenient approach to testing your powerful Python projects

$23.99    $12.00
by Daniel Arbuckle | September 2010 | Beginner's Guides Open Source

In this article by Daniel Arbuckle, author of Python Testing, we shall:

  • Discuss in detail what Unit testing is
  • Talk about the ways in which Unit testing helps various stages of development
  • Work with examples that illustrate Unit testing and its advantages

So, let's get on with it!

(For more resources on Python, see here.)

What is Unit testing and what it is not?

The title of this section, begs another question: "Why do I care?" One answer is that Unit testing is a best practice that has been evolving toward its current form over most of the time that programming has existed. Another answer is that the core principles of Unit testing are just good sense; it might actually be a little embarrassing to our community as a whole that it took us so long to recognize them.

Alright, so what is Unit testing? In its most fundamental form, Unit testing can be defined as testing the smallest meaningful pieces of code (such pieces are called units), in such a way that each piece's success or failure depends only on itself. For the most part, we've been following this principle already.

There's a reason for each part of this definition: we test the smallest meaningful pieces of code because, when a test fails, we want that failure to tell where the problem is us as specifically as possible. We make each test independent because we don't want a test to make any other test succeed, when it should have failed; or fail when it should have succeeded. When tests aren't independent, you can't trust them to tell you what you need to know.

Traditionally, automated testing is associated with Unit testing. Automated testing makes it fast and easy to run unit tests, which tend to be amenable to automation. We'll certainly make heavy use of automated testing with doctest and later with tools such as unittest and Nose as well.

Any test that involves more than one unit is automatically not a unit test. That matters because the results of such tests tend to be confusing. The effects of the different units get tangled together, with the end result that not only do you not know where the problem is (is the mistake in this piece of code, or is it just responding correctly to bad input from some other piece of code?), you're also often unsure exactly what the problem is this output is wrong, but how does each unit contribute to the error? Empirical scientists must perform experiments that check only one hypothesis at a time, whether the subject at hand is chemistry, physics, or the behavior of a body of program code.

Time for action – identifying units

Imagine that you're responsible for testing the following code:

class testable:
def method1(self, number):
number += 4
number **= 0.5
number *= 7
return number
def method2(self, number):
return ((number * 2) ** 1.27) * 0.3
def method3(self, number):
return self.method1(number) + self.method2(number)
def method4(self):
return 1.713 * self.method3(id(self))
  1. In this example, what are the units? Is the whole class a single unit, or is each method a separate unit. How about each statement, or each expression? Keep in mind that the definition of a unit is somewhat subjective (although never bigger than a single class), and make your own decision.
  2. Think about what you chose. What would the consequences have been if you chose otherwise? For example, if you chose to think of each method as a unit, what would be different if you chose to treat the whole class as a unit?
  3. Consider method4. Its result depends on all of the other methods working correctly. On top of that, it depends on something that changes from one test run to another, the unique ID of the self object. Is it even possible to treat method4 as a unit in a self-contained test? If we could change anything except method4, what would we have to change to enable method4 to run in a self-contained test and produce a predictable result?

What just happened?

By answering those three questions, you thought about some of the deeper aspects of unit testing.

The question of what constitutes a unit, is fundamental to how you organize your tests. The capabilities of the language affects this choice. C++ and Java make it difficult or impossible to treat methods as units, for example, so in those languages each class is usually treated as a single unit. C, on the other hand, doesn't support classes as language features at all, so the obvious choice of unit is the function. Python is flexible enough that either classes or methods could be considered units, and of course it has stand-alone functions as well, which are also natural to think of as units. Python can't easily handle individual statements within a function or method as units, because they don't exist as separate objects when the test runs. They're all lumped together into a single code object that's part of the function.

The consequences of your choice of unit are far-reaching. The smaller the units are, the more useful the tests tend to be, because they narrow down the location and nature of bugs more quickly. For example, one of the consequences of choosing to treat the testable class as a single unit is that tests of the class will fail if there is a mistake in any of the methods. That tells you that there's a mistake in testable, but not (for example) that it's in method2. On the other hand, there is a certain amount of rigmarole involved in treating method4 and its like as units. Even so, I recommend using methods and functions as units most of the time, because it pays off in the long run.

In answering the third question, you probably discovered that the functions id and self.method3 would need to have different definitions, definitions that produced a predictable result, and did so without invoking code in any of the other units. In Python, replacing the real function with such stand-ins is fairly easy to do in an ad hoc manner.

Unit testing throughout the development process

We'll walk through the development of a single class, treating it with all the dignity of a real project. We'll be strictly careful to integrate unit testing into every phase of the project. This may seem silly at times, but just play along. There's a lot to learn from the experience.

The example we'll be working with is a PID controller. The basic idea is that a PID controller is a feedback loop for controlling some piece of real-world hardware. It takes input from a sensor that can measure some property of the hardware, and generates a control signal that adjusts that property toward some desired state. The position of a robot arm in a factory might be controlled by a PID controller.

If you want to know more about PID controllers, the Internet is rife with information. The Wikipedia entry is a good place to start: http://en.wikipedia.org/wiki/PID_controller.

Design phase

Our notional client comes to us with the following (rather sparse) specification:

We want a class that implements a PID controller for a single variable. The measurement, setpoint, and output should all be real numbers.

We need to be able to adjust the setpoint at runtime, but we want it to have a memory, so that we can easily return to the previous setpoint.

Time for action – unit testing during design

Time to make that specification a bit more formal—and complete—by writing unit tests that describe the desired behavior.

  1. We need to write a test that describes the PID constructor. After checking our references, we determine that a PID controller is defined by three gains, and a setpoint. The controller has three components: proportional, integral and derivative (hence the name PID). Each gain is a number that determines how much one of the three parts of the controller has on the final result. The setpoint determines what the goal of the controller is; in other words, to where it's trying to move the controlled variable. Looking at all that, we decide that the constructor should just store the gains and the setpoint, along with initializing some internal state that we know we'll need due to reading up on the workings of a PID controller:

    >>> import pid
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0)
    >>> controller.gains
    (0.5, 0.5, 0.5)
    >>> controller.setpoint
    [0.0]
    >>> controller.previous_time is None
    True
    >>> controller.previous_error
    0.0
    >>> controller.integrated_error
    0.0

  2. We need to write tests that describe measurement processing. This is the controller in action, taking a measured value as its input and producing a control signal that should smoothly move the measured variable to the setpoint. For this to work correctly, we need to be able to control what the controller sees as the current time. After that, we plug our test input values into the math that defines a PID controller, along with the gains, to figure out what the correct outputs would be:

    >>> import time
    >>> real_time = time.time
    >>> time.time = (float(x) for x in xrange(1, 1000)).next
    >>> pid = reload(pid)
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0)
    >>> controller.measure(12)
    -6.0
    >>> controller.measure(6)
    -3.0
    >>> controller.measure(3)
    -4.5
    >>> controller.measure(-1.5)
    -0.75
    >>> controller.measure(-2.25)
    -1.125
    >>> time.time = real_time

  3. We need to write tests that describe setpoint handling. Our client asked for a setpoint stack, so we write tests that check such stack behavior. Writing code that uses this stack behavior brings to our attention that fact that a PID controller with no setpoint is not a meaningful entity, so we add a test that checks that the PID class rejects that situation by raising an exception.

    >>> pid = reload(pid)
    >>> controller = pid.PID(P = 0.5, I = 0.5, D = 0.5, setpoint = 0)
    >>> controller.push_setpoint(7)
    >>> controller.setpoint
    [0.0, 7.0]
    >>> controller.push_setpoint(8.5)
    >>> controller.setpoint
    [0.0, 7.0, 8.5]
    >>> controller.pop_setpoint()
    8.5
    >>> controller.setpoint
    [0.0, 7.0]
    >>> controller.pop_setpoint()
    7.0
    >>> controller.setpoint
    [0.0]
    >>> controller.pop_setpoint()
    Traceback (most recent call last):
    ValueError: PID controller must have a setpoint

What just happened?

Our clients gave us a pretty good initial specification, but it left a lot of details to assumption. By writing these tests, we've codified exactly what our goal is. Writing the tests forced us to make our assumptions explicit. Additionally, we've gotten a chance to use the object, which gives us an understanding of it that would otherwise be hard to get at this stage.

Normally we'd place the doctests in the same file as the specification, and in fact that's what you'll find in the book's code archive. In the book format, we used the specification text as the description for each step of the example.

You could ask how many tests we should write for each piece of the specification. After all, each test is for certain specific input values, so when code passes it, all it proves is that the code produces the right results for that specific input. The code could conceivably do something entirely wrong, and still pass the test. The fact is that it's usually a safe assumption that the code you'll be testing was supposed to do the right thing, and so a single test for each specified property fairly well distinguishes between working and non-working code. Add to that tests for any boundaries specified—for "The X input may be between the values 1 and 7, inclusive" you might add tests for X values of 0.9 and 7.1 to make sure they weren't accepted—and you're doing fine.

There were a couple of tricks we pulled to make the tests repeatable and independent. In every test after the first, we called the reload function on the pid module, to reload it from the disk. That has the effect of resetting anything that might have changed in the module, and causes it to re-import any modules that it depends on. That latter effect is particularly important, since in the tests of measure, we replaced time.time with a dummy function. We want to be sure that the pid module uses the dummy time function, so we reload the pid module. If the real time function is used instead of the dummy, the test won't be useful, because there will be only one time in all of history at which it would succeed. Tests need to be repeatable.

The dummy time function is created by making an iterator that counts through the integers from 1 to 999 (as floating point values), and binding time.time to that iterator's next method. Once we were done with the time-dependent tests, we replaced the original time.time.

Right now, we have tests for a module that doesn't exist. That's good! Writing the tests was easier than writing the module will be, and it gives us a stepping stone toward getting the module right, quickly and easily. As a general rule, you always want to have tests ready before the code that they test is written.

Have a go hero

Try this a few times on your own: Describe some program or module that you'd enjoy having access to in real life, using normal language. Then go back through it and try writing tests, describing the program or module. Keep an eye out for places where writing the test makes you aware of ambiguities in your prior description, or makes you realize that there's a better way to do something.

Python Testing: Beginner's Guide An easy and convenient approach to testing your powerful Python projects
Published: January 2010
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

Read more about this book

(For more resources on Python, see here.)

Development phase

With tests in hand, we're ready to write some code. The tests will act as a guide to us, a specification that actively tells us when we get something wrong.

Time for action – unit testing during development

  1. The first step is to run the tests. Of course, we have a pretty good idea of what's going to happen; they're all going to fail. Still, it's useful to know exactly what the failures are, because those are the things that we need to address by writing code.

    Python: Unit Testing with Doctest

    There are many more failing tests after that, but you get the idea.

  2. Taking our cue from the tests, and our references on PID controllers, we write the pid.py module:

    from time import time
    class PID:
    def __init__(self, P, I, D, setpoint):
    self.gains = (float(P), float(I), float(D))
    self.setpoint = [float(setpoint)]
    self.previous_time = None
    self.previous_error = 0.0
    self.integrated_error = 0.0
    def push_setpoint(self, target):
    self.setpoint.append(float(target))
    def pop_setpoint(self):
    if len(self.setpoint) > 1:
    return self.setpoint.pop()
    raise ValueError('PID controller must have a setpoint')
    def measure(self, value):
    now = time()
    P, I, D = self.gains
    err = value - self.setpoint[-1]
    result = P * err
    if self.previous_time is not None:
    delta = now - self.previous_time
    self.integrated_error +q= err * delta
    result += I * self.integrated_error
    result += D * (err - self.previous_error) / delta
    self.previous_error = err
    self.previous_time = now
    return result

  3. Next we run the tests again. We're hoping that they will all pass, but unfortunately the measure method seems to have some sort of bug.

    Python: Unit Testing with Doctest

    There are several more reports showing similar things (five tests in total should fail). The measure function is working backwards, returning positive numbers when it should be returning negative, and vice-versa.

  4. We know we need to look for a sign error in the measure method, so we don't have too much trouble finding and fixing the bug. The measured value should be subtracted from the setpoint, not the other way around, on the fourth line of the measure method:
    err = self.setpoint[-1] – value

    After fixing that, we find that all the tests pass.

What just happened?

We used our tests to tell us what needed to be done and when our code was finished. Our first run of the tests gave us a list of things that needed to be written; a to-do list, of sorts. After we wrote some code, we ran the tests again to see if it was doing what we expected, which gave us a new to-do list. We keep on alternating between running the tests and writing code until the tests all passed. When all the tests pass, either we're done, or we need to write more tests.

Whenever we find a bug that isn't already caught by a test, the right thing to do is to add a test that catches it, and then to fix it. That way, you not only have a fixed bug, you have a test that covers some aspect of the program that wasn't tested before. That test may well catch other bugs in the future, or tell you if you accidentally re-introduced the original bug.

This "test a little, code a little" style of programming is called Test-Driven Development, and you'll find that it's very productive.

Notice that the pattern in the way the tests failed was immediately apparent. There's no guarantee that this will always be the case, of course, but it's quite common. Combined with the ability to narrow your attention to the specific units that are having problems, debugging is usually a snap.

Another thing to think about is test isolation. The methods of the PID class make use of variables stored in self, which means that in order for the tests to be isolated, we have to make sure that none of the changes to self variables made by any method propagate to any other method. We did that by just reloading the pid module and making a new instance of the PID class for each test. As long as the test (and the code being tested) doesn't invoke any other methods on self, that's all that we need.

Feedback phase

So, we have a PID controller, and it passes all the tests. We're feeling pretty good. Time to brave the lions, and show it to the client!

Luckily for us, for the most part they like it. They do have a few requests, though: They want us to let them optionally specify the current time as a parameter to measure, instead of just using time.time to figure it out. They also want us to change the signature of the constructor so that it takes an initial measurement and optional time as parameters. Finally, they want us to rename the measure function to calculate_response, because they think that more clearly describes what it does.

Time for action – unit testing during feedback

So, how are we going to deal with this? The program passes all the tests, but the tests no longer reflect the requirements.

  1. Add the initial parameter to the constructor test, and update the expected results.
  2. Add a second constructor test, which tests the optional time parameter that is now expected to be part of the constructor.
  3. Change the measure method's name to calculate_response in all tests.
  4. Add the initial constructor parameter in the calculate_response test – while we're doing that, we notice that this is going to change the way the calculate_response function behaves. We contact the client for clarification, and they decide it's okay, so we update the expectations to match what we calculate should happen after the change.
  5. Add a second calculate_response test, which checks its behavior when the optional time parameter is supplied.
  6. After making all those changes, our specification/test file looks like the following. Lines that have been changed or added are formatted differently, to help you spot them more easily.

    We want a class that implements a PID controller for a single
    variable. The measurement, setpoint, and output should all
    be real numbers. The constructor should accept an initial
    measurement value in addition to the gains and setpoint.
    >>> import time
    >>> real_time = time.time
    >>> time.time = (float(x) for x in xrange(1, 1000)).next
    >>> import pid
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0,
    ... initial=12)
    >>> controller.gains
    (0.5, 0.5, 0.5)
    >>> controller.setpoint
    [0.0]
    >>> controller.previous_time
    1.0
    >>> controller.previous_error
    -12.0
    >>> controller.integrated_error
    0.0
    >>> time.time = real_time
    The constructor should also optionally accept a parameter
    specifying when the initial measurement was taken.
    >>> pid = reload(pid)
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=1,
    ... initial=12, when=43)
    >>> controller.gains
    (0.5, 0.5, 0.5)
    >>> controller.setpoint
    [1.0]
    >>> controller.previous_time
    43.0
    >>> controller.previous_error
    -11.0
    >>> controller.integrated_error
    0.0
    >>> real_time = time.time
    >>> time.time = (float(x) for x in xrange(1, 1000)).next
    >>> pid = reload(pid)
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0,
    ... initial=12)
    >>> controller.calculate_response(6)
    -3.0
    >>> controller.calculate_response(3)
    -4.5
    >>> controller.calculate_response(-1.5)
    -0.75
    >>> controller.calculate_response(-2.25)
    -1.125
    >>> time.time = real_time
    The calculate_response method should be willing to accept a
    parameter specifying at what time the call is happening.
    >>> pid = reload(pid)
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0,
    ... initial=12, when=1)
    >>> controller.calculate_response(6, 2)
    -3.0
    >>> controller.calculate_response(3, 3)
    -4.5
    >>> controller.calculate_response(-1.5, 4)
    -0.75
    >>> controller.calculate_response(-2.25, 5)
    -1.125
    We need to be able to adjust the setpoint at runtime, but we
    want it to have a memory, so that we can easily return to the
    previous setpoint.
    >>> pid = reload(pid)
    >>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0,
    ... initial=12)
    >>> controller.push_setpoint(7)
    >>> controller.setpoint
    [0.0, 7.0]
    >>> controller.push_setpoint(8.5)
    >>> controller.setpoint
    [0.0, 7.0, 8.5]
    >>> controller.pop_setpoint()
    8.5
    >>> controller.setpoint
    [0.0, 7.0]
    >>> controller.pop_setpoint()
    7.0
    >>> controller.setpoint
    [0.0]
    >>> controller.pop_setpoint()
    Traceback (most recent call last):
    ValueError: PID controller must have a setpoint

What just happened?

Our tests didn't match the requirements any more, so they had to change.

Well and good, but we don't want them to change too much, because our collection of tests helps us avoid regressions in our code. Regressions are changes that cause something that used to work, to stop working. One of the best ways to avoid them is to avoid deleting tests. If you still have tests in place that check for every desired behavior and every bug fixed, then if you introduce a regression you find out about it immediately.

That's one reason why we added new tests to check the behavior when the optional time parameters are supplied. The other reason is that if we added those parameters to the existing tests, we wouldn't have any tests of what happens when you don't use those parameters. We always want to check every code path through each unit.

Sometimes, a test just isn't right any more. For example, tests that make use of the measure method are just plain wrong, and need to be updated to call calculate_response instead. When we change these tests, though, we still change them as little as possible. After all, we don't want the test to stop checking for old behavior that's still correct, and we don't want to introduce a bug in the test itself.

The addition of the initial parameter to the constructor is a big deal. It not only changes the way the constructor should behave, it also changes the way the calculate_response (née measure) method should behave in a rather dramatic way. Since this is a change in the correct behavior (a fact which we didn't realize until the tests pointed it out to us, which in turn allowed us to get confirmation of what the correct behavior should be from our clients before we started writing the code), we have no choice but to go through and change the tests, recalculating the expected outputs. However, doing all that work has a benefit over and above the future ability to check that the function is working correctly; it makes it much easier to comprehend how the function should work when we actually write it.

Summary

We learned a lot in this article about Unit testing and Test-Driven Development, which are best-practice disciplines for quickly building reliable programs.

Specifically, we covered the definition of Unit testing, how unit testing can help during each stage of the development process, what it feels like to use unit testing to drive development, and how it can make the process quicker and more pleasant.


Further resources on this subject:


Python Testing: Beginner's Guide An easy and convenient approach to testing your powerful Python projects
Published: January 2010
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

About the Author :


Daniel Arbuckle

Daniel Arbuckle holds a Ph.D. in Computer Science from the University of Southern California. While at USC, he performed original research in the Interaction Lab (part of the Center for Robotics and Embedded Systems) and the Laboratory for Molecular Robotics (now part of the Nanotechnology Research Laboratory). His work has been published in peer-reviewed journals and in the proceedings of international conferences.

Books From Packt


Python 3 Object Oriented Programming
Python 3 Object Oriented Programming

Python Multimedia
Python Multimedia

MySQL for Python: Database Access Made Easy
MySQL for Python: Database Access Made Easy

Spring Python 1.1
Spring Python 1.1

Matplotlib for Python Developers
Matplotlib for Python Developers

Expert Python Programming
Expert Python Programming

Agile Web Application Development with Yii1.1 and PHP5
Agile Web Application Development with Yii1.1 and PHP5

Drupal 7
Drupal 7


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software