Embedding Doctests in Python Docstrings

Exclusive offer: get 50% off this eBook here
Python Testing: Beginner's Guide

Python Testing: Beginner's Guide — Save 50%

An easy and convenient approach to testing your powerful Python projects

$23.99    $12.00
by Daniel Arbuckle | January 2010 | Open Source

In this article by Daniel Arbuckle, we shall:

  • Write doctests embedded in Python docstrings
  • Consider real-world specification for the AVL tree

Doctests aren't confined to simple text files. You can put doctests into Python's docstrings.

Why would you want to do that? There are a couple of reasons. First of all, docstrings are an important part of the usability of Python code (but only if they tell the truth). If the behavior of a function, method, or module changes and the docstring doesn't get updated, then the docstring becomes misinformation, and a hindrance rather than a help. If the docstring contains a couple of doctest examples, then the out-of-date docstrings can be located automatically. Another reason for placing doctest examples into docstrings is simply that it can be very convenient. This practice keeps the tests, documentation and code all in the same place, where it can all be located easily.

If the docstring becomes home to too many tests, this can destroy its utility as documentation. This should be avoided; if you find yourself with so many tests in the docstrings that they aren't useful as a quick reference, move most of them to a separate file.

Time for action – embedding a doctest in a docstring

We'll embed a test right inside the Python source file that it tests, by placing it inside a docstring.

  1. Create a file called test.py with the following contents:
      def testable(x):
      r"""
      The `testable` function returns the square root of its
      parameter, or 3, whichever is larger.
      >>> testable(7)
      3.0
      >>> testable(16)
      4.0
      >>> testable(9)
      3.0
      >>> testable(10) == 10 ** 0.5
      True
      """
      if x < 9:
      return 3.0
      return x ** 0.5
  2. At the command prompt, change to the directory where you saved test.py and then run the tests by typing:
      $ python -m doctest test.py
  3. As mentioned earlier before, if you have an older version of Python, this isn't going to work for you. Instead, you need to type python -c "__import__('doctest').testmod(__import__('test'))"

  4. If everything worked, you shouldn't see anything at all. If you want some confirmation that doctest is doing something, turn on verbose reporting by changing the command to:
      python -m doctest -v test.py
  5. For older versions of Python, instead use python -c "__import__('doctest').testmod(__import__('test'), verbose=True)"

What just happened

You put the doctest right inside the docstring of the function it was testing. This is a good place for tests that also show a user how to do something. It's not a good place for detailed, low-level tests (the above example, which was quite detailed for illustrative purposes, is skirting the edge of being too detailed), because docstrings need to serve as API documentation. You can see the reason for this just by looking back at the example, where the doctests take up most of the room in the docstring, without telling the readers any more than they would have learned from a single test.

Any test that will serve as good API documentation is a good candidate for including in the docstrings.

Notice the use of a raw string for the docstring (denoted by the r character before the first triple-quote). Using raw strings for your docstrings is a good habit to get into, because you usually don't want escape sequences—e.g. n for newline—to be interpreted by the Python interpreter. You want them to be treated as text, so that they are correctly passed on to doctest.

Doctest directives

Embedded doctests can accept exactly the same directives as doctests in text files can, using exactly the same syntax. Because of this, all of the doctest directives that we discussed before can also be used to aff ect the way embedded doctests are evaluated.

Execution scope

Doctests embedded in docstrings have a somewhat different execution scope than doctests in text files do. Instead of having a single scope for all of the tests in the file, doctest creates a single scope for each docstring. All of the tests that share a docstring, also share an execution scope, but they're isolated from tests in other docstrings.

The separation of each docstring into its own execution scope often means that we don't need to put much thought into isolating doctests, when they're embedded in docstrings. That is fortunate, since docstrings are primarily intended for documentation, and the tricks needed to isolate the tests might obscure the meaning.

Putting it in practice: an AVL tree

We'll walk step-by-step through the process of using doctest to create a testable specification for a data structure called an AVL Tree. An AVL tree is a way to organize key-value pairs, so that they can be quickly located by key. In other words, it's a lot like Python's built-in dictionary type. The name AVL references the initials of the people who invented this data structure.

As its name suggests, an AVL tree organizes the keys that are stored in it into a tree structure, with each key having up to two child keys—one child key that is less than the parent key by comparison, and one that is more. In the following picture, the key Elephant has two child keys, Goose has one, and Aardvark and Frog both have none.

Python Testing: Beginner's Guide

The AVL tree is special, because it keeps one side of the tree from getting much taller than the other, which means that users can expect it to perform reliably and efficiently no matter what. In the previous image, an AVL tree would reorganize to stay balanced if Frog gained a child.

We'll write tests for an AVL tree implementation here, rather than writing the implementation itself. Therefore, we'll elaborate over the details of how an AVL tree works, in favor of looking at what it should do when it works right

If you want to know more about AVL Trees, you will find many good references on the Internet. Wikipedia's entry on the subject is a good place to start with:http://en.wikipedia.org/wiki/AVL_tree.

We'll start with a plain language specification, and then interject tests between the paragraphs.

You don't have to actually type all of this into a text file; it is here for you to read and to think about.

English specification

The first step is to describe what the desired result should be, in normal language. This might be something that you do for yourself, or something that somebody else does for you. If you're working for somebody, hopefully you and your employer can sit down together and work this part out.

In this case, there's not much to work out, because AVL Trees have been fully described for decades. Even so, the description here isn't quite like one you'd find anywhere else. This capacity for ambiguity is exactly the reason why a plain language specification isn't good enough. We need an unambiguous specification, and that's exactly what the tests in a doctest file can give us.

The following text goes in a file called AVL.txt, (which you can find in its final form in the accompanying code archive. At this stage of the process, the file contains only the normal language specification.):

An AVL Tree consists of a collection of nodes organized in a binary
tree structure. Each node has left and right children, each of which
may be either None or another tree node. Each node has a key, which
must be comparable via the less-than operator. Each node has a value.
Each node also has a height number, measuring how far the node is from
being a leaf of the tree -- a node with height 0 is a leaf.

The binary tree structure is maintained in ordered form, meaning that
of a node's two children, the left child has a key that compares
less than the node's key and the right child has a key that compares
greater than the node's key.

The binary tree structure is maintained in a balanced form, meaning
that for any given node, the heights of its children are either the
same or only differ by 1.

The node constructor takes either a pair of parameters representing
a key and a value, or a dict object representing the key-value pairs
with which to initialize a new tree.

The following methods target the node on which they are called, and
can be considered part of the internal mechanism of the tree:

Each node has a recalculate_height method, which correctly sets the
height number.

Each node has a make_deletable method, which exchanges the positions
of the node and one of its leaf descendants, such that the the tree
ordering of the nodes remains correct.

Each node has rotate_clockwise and rotate_counterclockwise methods.
Rotate_clockwise takes the node's right child and places it where
the node was, making the node into the left child of its own former
child. Other nodes in the vicinity are moved so as to maintain
the tree ordering. The opposite operation is performed by rotate_
counterclockwise.

Each node has a locate method, taking a key as a parameter, which
searches the node and its descendants for a node with the specified
key, and either returns that node or raises a KeyError.

The following methods target the whole tree rooted at the current
node. The intent is that they will be called on the root node:

Each node has a get method taking a key as a parameter, which locates
the value associated with the specified key and returns it, or raises
KeyError if the key is not associated with any value in the tree.

Each node has a set method taking a key and a value as parameters, and
associating the key and value within the tree.
Each node has a remove method taking a key as a parameter, and
removing the key and its associated value from the tree. It raises
KeyError if no values was associated with that key.

Node data

The first three paragraphs of the specification describe the member variables of a AVL tree node, and tell us what the valid values for the variables are. They also tell us how tree height should be measured and define what a balanced tree means. It's our job now to take up those ideas, and encode them into tests that the computer can eventually use to check our code.

We could check these specifications by creating a node and then testing the values, but that would really just be a test of the constructor. It's important to test the constructor, but what we really want to do is to incorporate checks that the node variables are left in a valid state into our tests of each member function.

To that end, we'll define a function that our tests can call to check that the state of a node is valid. We'll define that function just after the third paragraph:

Notice that this test is written as if the AVL tree implementation already existed. It tries to import an avl_tree module containing an AVL class, and it tries to use the AVL class is specific ways. Of course, at the moment there is no avl_tree module, so the test will fail. That's as it should be. All that the failure means is that, when the ti me comes to implement the tree, we should do so in a module called avl_tree, with contents that function as our test assumes. Part of the benefit of testing like this is being able to test-drive your code before you even write it.

>>> from avl_tree import AVL
>>> def valid_state(node):
... if node is None:
... return
... if node.left is not None:
... assert isinstance(node.left, AVL)
... assert node.left.key < node.key
... left_height = node.left.height + 1
... else:
... left_height = 0
...
... if node.right is not None:
... assert isinstance(node.right, AVL)
... assert node.right.key > node.key
... right_height = node.right.height + 1
... else:
... right_height = 0
...
... assert abs(left_height - right_height) < 2
... node.key < node.key
... node.value
>>> def valid_tree(node):
... if node is None:
... return
... valid_state(node)
... valid_tree(node.left)
... valid_tree(node.right)

Notice that we didn't actually call those functions yet. They aren't tests, per se, but tools that we'll use to simplify writing tests. We define them here, rather than in the Python module that we're going to test, because they aren't conceptually part of the tested code, and because anyone who reads the tests will need to be able to see what the helper functions do.

Constructor

The fourth paragraph describes the constructor for an AVL node: The node constructor takes either a pair of parameters representing a key and a value, or a dict object representing the key-value pairs with which to initialize a new tree.

The constructor has two possible modes of operation:

  • it can either create a single initialized node
  • or it can create and initialize a whole tree of nodes. The test for the single node mode is easy:
      >>> valid_state(AVL(2, 'Testing is fun'))

The other mode of the constructor is a problem, because it is almost certain that it will be implemented by creating an initial tree node and then calling its set method to add the rest of the nodes. Why is that a problem? Because we don't want to test the set method here: this test should be focused entirely on whether the constructor works correctly, when everything it depends on works.

In other words, the tests should be able to assume that everything outside of the specific chunk of code being tested works correctly.

However, that's not always a valid assumption. So, how can we write tests for things that call on code outside of what's being tested?

There is a solution for this problem. For now, we'll just leave the second mode of operation of the constructor untested.

Python Testing: Beginner's Guide An easy and convenient approach to testing your powerful Python projects
Published: January 2010
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

Recalculate height

The recalculate_height method is described in the fifth paragraph.

To test it, we'll need a tree for it to operate on, and we don't want to use the second mode of the constructor to create it. After all, that mode isn't tested at all yet, and even if it were, we want this test to be independent of it. We would prefer to make the test entirely independent of the constructor, but in this case we need to make a small exception to the rule(since it's difficult to create an object without calling its constructor in some way).

What we'll do is define a function that builds a specific tree and returns it. This function will be useful in several of our later tests as well. Using this function, testing recalculate_height will be easy.

>>> def make_test_tree():
... root = AVL(7, 'seven')
... root.height = 2
... root.left = AVL(3, 'three')
... root.left.height = 1
... root.left.right = AVL(4, 'four')
... root.right = AVL(10, 'ten')
... return root
>>> tree = make_test_tree()
>>> tree.height = 0
>>> tree.recalculate_height()
>>> tree.height
2

The make_test_tree function builds a tree by manually constructing each part of it and hooking it together into a structure that looks like this:

Python Testing: Beginner's Guide

Make deletable

You can't delete a node that has children, because that would leave the node's children disconnected from the rest of the tree. If we delete the Elephant node from the bottom of the tree, what do we do about Aardvark, Goose, and Frog? If we delete Goose, how do we find Frog afterwards?

Python Testing: Beginner's Guide

The way around that is to have the node swap places with it's largest leaf descendant on the left side (or its smallest leaf descendant on the right side, but we'll not do it that way).

We'll test this by using the same make_test_tree function that we defined before to create a new tree to work on, and then checking that make_deletable swaps correctly:

Each node has a make_deletable method, which exchanges the positions
of the node and one of its leaf descendants, such that the the tree
ordering of the nodes remains correct.
>>> tree = make_test_tree()
>>> target = tree.make_deletable()
>>> (tree.value, tree.height)
('four', 2)
>>> (target.value, target.height)
('seven', 0)

Something to notice here is that the make_deletable function isn't supposed to delete the node that it's called on. It's supposed to move that node into a position where it could be safely deleted. It must do this reorganization of the tree, without violating any of the constraints that define an AVL tree structure.

Rotation

The two rotate functions perform a somewhat tricky manipulation of the links in a tree. You probably found the plain language description of what they do, a bit confusing. This is one of those times when a little bit of code makes a whole lot more sense than any number of sentences.

While tree rotation is usually defined in terms of rearranging the links between nodes in the tree, we'll check whether it worked by looking at the values (rather than by looking directly at the left and right links). This allows the implementation to swap the contents of nodes—rather than the nodes themselves—when it wishes. After all, it's not important to the specification which operation happens, so we shouldn't rule out a perfectly reasonable implementation choice.

The two rotate functions perform a somewhat tricky manipulation of the links in a tree. You probably found the plain language description of what they do, a bit confusing. This is one of those times when a little bit of code makes a whole lot more sense than any number of sentences.

While tree rotation is usually defined in terms of rearranging the links between nodes in the tree, we'll check whether it worked by looking at the values (rather than by looking directly at the left and right links). This allows the implementation to swap the contents of nodes—rather than the nodes themselves—when it wishes. After all, it's not important to the specification which operation happens, so we shouldn't rule out a perfectly reasonable implementation choice.

The first part of the test code for rotation just creates a tree and verifies that it looks like we expect it to:

>>> tree = make_test_tree()
>>> tree.value
'seven'
>>> tree.left.value
'three'

Once we have a tree to work with, we try a rotation operation and check that the result still looks like it should:

>>> tree.rotate_counterclockwise()
>>> tree.value
'three'
>>> tree.left
None
>>> tree.right.value
'seven'
>>> tree.right.left.value
'four'
>>> tree.right.right.value
'ten'
>>> tree.right.left.value
'four'
>>> tree.left is None
True

Finally, we rotate back in the other directi on, and check that the final result is the same as the original tree, as we expect it to be:

>>> tree.rotate_clockwise()
>>> tree.value
'seven'
>>> tree.left.value
'three'
>>> tree.left.right.value
'four'
>>> tree.right.value
'ten'
>>> tree.right.left is None
True
>>> tree.left.left is None
True

Locating a node

The locate method is expected to return a node, or raise a KeyError exception, depending on whether the key exists in the tree or not. We'll use our specially built tree again, so that we know exactly what the tree's structure looks like.

>>> tree = make_test_tree()
>>> tree.locate(4).value
'four'
>>> tree.locate(17) # doctest: +ELLIPSIS
Traceback (most recent call last):
KeyError: …

The locate method is intended to facilitate insertion, deletion, and lookup of values based on their keys, but it's not a high-level interface. It returns a node object, because it's easy to implement the higher-level operations, if you have a function the finds the right node for you.

Testing the rest of the specification

Like the second mode of the constructor, testing the rest of the specification involves testing code that depends on things outside of itself.

Summary

Here, we took a real-world specification for the AVL tree, and examined how to formalize it as a set of doctests, so that we could use it to automatically check the correctness of an implementation.

Specifically, we covered how to write doctests in Python docstrings, and what it feels like to use doctest to turn a specification into tests.

If you have read this article you may be interested to view :

Python Testing: Beginner's Guide An easy and convenient approach to testing your powerful Python projects
Published: January 2010
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

About the Author :


Daniel Arbuckle

Daniel Arbuckle holds a Ph.D. in Computer Science from the University of Southern California. While at USC, he performed original research in the Interaction Lab (part of the Center for Robotics and Embedded Systems) and the Laboratory for Molecular Robotics (now part of the Nanotechnology Research Laboratory). His work has been published in peer-reviewed journals and in the proceedings of international conferences.

Books From Packt

Joomla! with Flash
Joomla! with Flash

WordPress 2.8 Theme Design
WordPress 2.8 Theme Design

Plone 3 for Education
Plone 3 for Education

ADempiere 3.4 ERP Solutions
ADempiere 3.4 ERP Solutions

Spring Persistence with Hibernate
Spring Persistence with Hibernate

Building Telephony Systems with OpenSIPS 1.6
Building Telephony Systems with OpenSIPS 1.6

Expert Python Programming
Expert Python Programming

Funambol Mobile Open Source
Funambol Mobile Open Source

No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
w
q
r
m
m
P
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software