Documenting Your Python Project-part2

Exclusive offer: get 50% off this eBook here
Expert Python Programming

Expert Python Programming — Save 50%

Best practices for designing, coding, and distributing your Python software

$26.99    $13.50
by Tarek Ziadé | May 2009 | Open Source

This is a 2-part series by Tarek Ziadé. This series is all about documentation and gives tips on technical writing and how Python projects should be documented. In the first part of this series, you have learnt about the 7 golden rules of technical writing and understand reStructuredText Primer. In the this part of the series, you will learn how to build the documentation.

Building the Documentation

An easier way to guide your readers and your writers is to provide each one of them with helpers and guidelines, as we have learned in the previous section of this article.

From a writer's point of view, this is done by having a set of reusable templates together with a guide that describes how and when to use them in a project. It is called a documentation portfolio.

From a reader point of view, being able to browse the documentation with no pain, and getting used to finding the info efficiently, is done by building a document landscape.

Building the Portfolio

There are many kinds of documents a software project can have, from low-level documents that refer directly to the code, to design papers that provide a high-level overview of the application.

For instance, Scott Ambler defines an extensive list of document types in his book Agile Modeling (http://www.agilemodeling.com/essays/agileArchitecture.htm). He builds a portfolio from early specifications to operations documents. Even the project management documents are covered, so the whole documenting needs are built with a standardized set of templates.

Since a complete portfolio is tightly related to the methodologies used to build the software, this article will only focus on a common subset that you can complete with your specific needs. Building an efficient portfolio takes a long time, as it captures your working habits.

A common set of documents in software projects can be classified in three categories:

  • Design: All documents that provide architectural information, and low-level design information, such as class diagrams, or database diagrams
  • Usage: Documents on how to use the software; this can be in the shape of a cookbook and tutorials, or a module-level help
  • Operations: Provide guidelines on how to deploy, upgrade, or operate the software

Design

The purpose of design documentation is to describe how the software works and how the code is organized. It is used by developers to understand the system but is also a good entry point for people who are trying to understand how the application works.

The different kinds of design documents a software can have are:

  • Architecture overview
  • Database models
  • Class diagrams with dependencies and hierarchy relations
  • User interface wireframes
  • Infrastructure description

Mostly, these documents are composed of some diagrams and a minimum amount of text. The conventions used for the diagrams are very specific to the team and the project, and this is perfectly fine as long as it is consistent.

UML provides thirteen diagrams that cover most aspects in a software design. The class diagram is probably the most used one, but it is possible to describe every aspect of software with it. See http://en.wikipedia.org/wiki/Unified_Modeling_Language#Diagrams.

Following a specific modeling language such as UML is not often fully done, and teams just make up their own way throughout their common experience. They pick up good practice from UML or other modeling languages, and create their own recipes.

For instance, for architecture overview diagrams, some designers just draw boxes and arrows on a whiteboard without following any particular design rules and take a picture of it. Others work with simple drawing programs such as Dia (http://www.gnome.org/projects/dia) or Microsoft Visio (not open source, so not free), since it is enough to understand the design.

Database model diagrams depend on the kind of database you are using. There are complete data modeling software applications that provide drawing tools to automatically generate tables and their relations. But this is overkill in Python most of the time. If you are using an ORM such as SQLAlchemy (for instance), simple boxes with lists of fields, together with table relations are enough to describe your mappings before you start to write them.

Class diagrams are often simplified UML class diagrams: There is no need in Python to specify the protected members of a class, for instance. So the tools used for an architectural overview diagram fit this need too.

User interface diagrams depend on whether you are writing a web or a desktop application. Web applications often describe the center of the screen, since the header, footer, left, and right panels are common. Many web developers just handwrite those screens and capture them with a camera or a scanner. Others create prototypes in HTML and make screen snapshots. For desktop applications, snapshots on prototype screens, or annotated mock-ups made with tools such as Gimp or Photoshop are the most common way.

Infrastructure overview diagrams are like architecture diagrams, but they focus on how the software interacts with third-party elements, such as mail servers, databases, or any kind of data streams.

Common Template

The important point when creating such documents is to make sure the target readership is perfectly known, and the content scope is limited. So a generic template for design documents can provide a light structure with a little advice for the writer.

Such a structure can include:

  • Title
  • Author
  • Tags (keywords)
  • Description (abstract)
  • Target (Who should read this?)
  • Content (with diagrams)
  • References to other documents

The content should be three or four screens (a 1024x768 average screen) at the most, to be sure to limit the scope. If it gets bigger, it should be split into several documents or summarized.

The template also provides the author's name and a list of tags to manage its evolutions and ease its classification. This will be covered later in the article.

Paster is the right tool to use to provide templates for documentation. pbp.skels implements the design template described, and can be used exactly like code generation. A target folder is provided and a few questions are answered:

$ paster create -t pbp_design_doc design
Selected and implied templates:
pbp.skels#pbp_design_doc A Design document
Variables:
egg: design
package: design
project: design
Enter title ['Title']: Database specifications for atomisator.db
Enter short_name ['recipe']: mappers
Enter author (Author name) ['John Doe']: Tarek
Enter keywords ['tag1 tag2']: database mapping sql
Creating template pbp_design_doc
Creating directory ./design
Copying +short_name+.txt_tmpl to ./design/mappers.txt

The result can then be completed:

=========================================
Database specifications for atomisator.db
=========================================
:Author: Tarek
:Tags: database mapping sql
:abstract:
Write here a small abstract about your design document.
.. contents ::
Who should read this ?
::::::::::::::::::::::
Explain here who is the target readership.
Content
:::::::
Write your document here. Do not hesitate to split it in several
sections.
References
::::::::::
Put here references, and links to other documents.

Usage

Usage documentation describes how a particular part of the software works. This documentation can describe low-level parts such as how a function works, but also high-level parts such command-line arguments for calling the program. This is the most important part of documentation in framework applications, since the target readership is mainly the developers that are going to reuse the code.

The three main kinds of documents are:

  • Recipe: A short document that explains how to do something. This kind of document targets one readership and focuses on one specific topic.
  • Tutorial: A step-by-step document that explains how to use a feature of the software. This document can refer to recipes, and each instance is intended to one readership.
  • Module helper: A low-level document that explains what a module contains. This document could be shown (for instance) when you call the help built-in over a module.

Recipe

A recipe answers a very specific problem and provides a solution to resolve it.

For example, ActiveState provides a Python Cookbook online (a cookbook is a collection of recipes), where developers can describe how to do something in Python (http://aspn.activestate.com/ASPN/Python/Cookbook).

These recipes must be short and are structured like this:

  • Title
  • Submitter
  • Last updated
  • Version
  • Category
  • Description
  • Source (the source code)
  • Discussion (the text explaining the code)
  • Comments (from the web)

Often, they are one-screen long and do not go into great details. This structure perfectly fits a software's needs and can be adapted in a generic structure, where the target readership is added and the category replaced by tags:

  • Title (short sentence)
  • Author
  • Tags (keywords)
  • Who should read this?
  • Prerequisites (other documents to read, for example)
  • Problem (a short description)
  • Solution (the main text, one or two screens)
  • References (links to other documents)

The date and version are not useful here, since we will see later that the documentation is managed like source code in the project.

Like the design template, pbp.skels provide a pbp_recipe_doc template that can be used to generate this structure:

$ paster create -t pbp_recipe_doc recipes
Selected and implied templates:
pbp.skels#pbp_recipe_doc A recipe
Variables:
egg: recipes
package: recipes
project: recipes
Enter title (use a short question): How to use atomisator.db
Enter short_name ['recipe'] : atomisator-db
Enter author (Author name) ['John Doe']: Tarek
Enter keywords ['tag1 tag2']: atomisator db
Creating template pbp_recipe_doc
Creating directory ./recipes
Copying +short_name+.txt_tmpl to ./recipes/atomisator-db.txt

The result can then be completed by the writer:

========================
How to use atomisator.db
========================
:Author: Tarek
:Tags: atomisator db
.. contents ::
Who should read this ?
::::::::::::::::::::::
Explain here who is the target readership.
Prerequisites
:::::::::::::
Put here the prerequisites for people to follow this recipe.
Problem
:::::::
Explain here the problem resolved in a few sentences.
Solution
::::::::
Put here the solution.
References
::::::::::
Put here references, and links to other recipes.

Tutorial

A tutorial differs from a recipe in its purpose. It is not intended to resolve an isolated problem, but rather describes how to use a feature of the application step by step. This can be longer than a recipe and can concern many parts of the application. For example, Django provides a list of tutorials on its website. Writing your first Django App, part 1 (http://www.djangoproject.com/documentation/tutorial01) explains in ten screens how to build an application with Django.

A structure for such a document can be:

  • Title (short sentence)
  • Author
  • Tags (words)
  • Description (abstract)
  • Who should read this?
  • Prerequisites (other documents to read, for example)
  • Tutorial (the main text)
  • References (links to other documents)

The pbp_tutorial_doc template is provided in pbp.skels as well with this structure, which is similar to the design template.

Module Helper

The last template that can be added in our collection is the module helper template. A module helper refers to a single module and provides a description of its contents, together with usage examples.

Some tools can automatically build such documents by extracting the docstrings and computing module help using pydoc, like Epydoc ( http://epydoc.sourceforge.net). So it is possible to generate an extensive documentation based on API introspection. This kind of documentation is often provided in Python frameworks. For instance Plone provides an http://api.plone.org server that keeps an up-to-date collection of module helpers.

The main problems with this approach are:

  • There is no smart selection performed over the modules that are really interesting to document.
  • The code can be obfuscated by the documentation.

Furthermore, module documentation provides examples that sometimes refer to several parts of the module, and are hard to split between the functions' and classes' docstrings. The module docstring could be used for that purpose by writing a text at the top of the module. But this ends in having a hybrid file composed of a block of text, then a block of code. This is rather obfuscating when the code represents less than 50% of the total length. If you are the author, this is perfectly fine. But when people try to read the code (not the documentation), they will have to jump the docstrings part.

Another approach is to separate the text in its own file. A manual selection can then be operated to decide which Python module will have its module helper file. The documents can then be separated from the code base and allowed to live their own life, as we will see in the next part. This is how Python is documented.

Many developers will disagree on the fact that doc and code separation is better than docstrings. This approach means that the documentation process is fully integrated in the development cycle; otherwise it will quickly become obsolete. The docstrings approach solves this problem by providing proximity between the code and its usage example, but doesn't bring it to a higher level: a document that can be used as part of a plain documentation.

The template for Module Helper is really simple, as it contains just a little metadata before the content is written. The target is not defined since it is the developers who wish to use the module:

  • Title (module name)
  • Author
  • Tags (words)
  • Content
Expert Python Programming Best practices for designing, coding, and distributing your Python software
Published: September 2008
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

Operations

Operation documents are used to describe how the software can be operated.

For instance:

  • Installation and deployment documents
  • Administration documents
  • "Frequently Asked Questions" documents that help the users when a failure occurs
  • Documents that explain how people can ask for help or provide feedback

These documents are very specific, but they can probably use the tutorial template defined in the earlier section.

Make Your Own Portfolio

Keep in mind the light but sufficient approach for project documentation: Each document added should have a clearly defined target readership and should fill a real need. Documents that don't add a real value should not be written.

Building the Landscape

The document portfolio built in the previous section provides a structure at document level, but does not provide a way to group and organize it to build the documentation the readers will have. This is what Andreas Rüping calls a document landscape, referring to the mental map the readers use when they browse documentation. He came up with the conclusion that the best way to organize documents is to build a logical tree.

In other words, the different kinds of documents composing the portfolio need to find a place to live within a tree of directories. This place must be obvious to the writers when they create the document and to the readers when they are looking for it.

A great helper in browsing documentation is index pages at each level that can drive writers and readers.

Building a document landscape is done in two steps:

  • Building a tree for the producers (the writers)
  • Building a tree for the consumers (the readers), on the top of the producers' one

This distinction between producers and consumers is important since they access the documents in different places and different formats.

Producer's Layout

From a producer's point of view, each document is processed exactly like a Python module. It should be stored in the version control system and worked like code. Writers do not care about the final appearance of their prose and where it is available. They just want to make sure that they are writing a document, so it is the single source of truth on the topic covered.

reStructuredText files stored in a folder tree are available in the version control system together with the software code, and are a convenient solution to build the documentation landscape for producers.

The simplest way to organize the tree is to group documents by nature:

$ cd atomisator
$ find docs
docs
docs/source
docs/source/design
docs/source/operations
docs/source/usage
docs/source/usage/cookbook
docs/source/usage/modules
docs/source/usage/tutorial

Notice that the tree is located in a source folder because the docs folder will be used as a root folder to set up a special tool in the next section.

From there, an index.txt file can be added at each level (besides the root), explaining what kind of documents the folder contains, or summarizing what each sub-folder contains. These index files can define a listing of the documents they contain. For instance, the operation folder can contain a list of operations documents available:

==========
Operations
==========
This section contains operations documents:
− How to install and run Atomisator
− How to install and manage a PostgreSQL database
for Atomisator

So that people do not forget to update them, we can have lists generated automatically.

Consumer's Layout

From a consumer's point of view, it is important to work out the index files and to present the whole documentation in a format that is easy to read and looks good. Web pages are the best pick and are easy to generate from reStructuredText files.

Sphinx is a set of scripts and docutils extensions that can be used to generate an HTML structure from our text tree. This tool is used (for instance) to build the Python documentation and many projects are now using it for their documentation. Among its built-in features, it produces a really nice browsing system, together with a light but sufficient client-side JavaScript search engine. It also uses pygments for rendering code examples, which produces really nice syntax highlights.

Sphinx can be easily configured to stick with the document landscape defined in the earlier section.

To install it, just call easy_install:

$ sudo easy_install-2.5 Sphinx
Searching for Sphinx
Reading http://cheeseshop.python.org/pypi/Sphinx/
...
Finished processing dependencies for Sphinx

This installs a few scripts such as sphinx-quickstart. This script will generate a script together with a Makefile, which can be used to generate the web documentation every time it is needed. Let's run this script in the docs folder and answer its questions:

$ sphinx-quickstart
Welcome to the Sphinx quickstart utility.
Enter the root path for documentation.
> Root path for the documentation [.]:
> Separate source and build directories (y/n) [n]: y
> Name prefix for templates and static dir [.]:
> Project name: Atomisator
> Author name(s): Tarek Ziadé
> Project version: 0.1.0
> Project release [0.1.0]:
> Source file suffix [.rst]: .txt
> Name of your master document (without suffix) [index]:
> Create Makefile? (y/n) [y]: y
Finished: An initial directory structure has been created.

You should now populate your master file ./source/index.txt and create other documentation source files. Use the sphinx-build.py script to build the docs, like so:

make <builder>

This adds a conf.py file in the source folder that contains the configuration defined through the answers, and an index.txt file at the root, together with aMakefile in docs.

Running make html will then generate a tree in build:

$ make html
mkdir -p build/html build/doctrees
sphinx-build.py -b html -d build/doctrees -D latex_paper_size= source
build/html
Sphinx v0.1.61611, building html
trying to load pickled env... done
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
creating index...
writing output... index
finishing...
writing additional files...
copying static files...
dumping search index...
build succeeded.
Build finished. The HTML pages are in build/html.

The documentation will then be available in build/html, starting at index.html.

Expert Python Programming

Besides the HTML versions of the documents, the tool also builds automatic pages such as a module list and an index. Sphinx provides a few docutils extensions to drive these features. The main ones are:

  • A directive that builds a table of contents
  • A marker that can be used to register a document as a module helper
  • A marker to add an element in the index

Working on the Index Pages

Sphinx provides a toctree directive that can be used to inject a table of contents in a document, with links to other documents. Each line must be a file with its relative path, starting from the current document. Glob-style names can also be provided to add several files that match the expression.

For example, the index file in the cookbook folder, which we have previously defined in the producer's landscape, can look like this:

========
Cookbook
========
Welcome to the CookBook.
Available recipes:
.. toctree::
:glob:
*

With this syntax, the HTML page will display a list of all reStructuredText documents available in the cookbook folder. This directive can be used in all index files to build a browseable documentation.

Registering Module Helpers

For module helpers, a marker can be added so that it is automatically listed and available in the module's index page:

=======
session
=======
.. module:: db.session
The module session...

Notice that the db prefix here can be used to avoid module collision. Sphinx will use it as a module category and will group all modules that start with db. in this category.

For Atomisator db, feed, main, and parser can be used in order to group the entries, as shown in the figure:

Expert Python Programming

In your documentation, you can use this feature when you have a lot of modules.

Adding Index Markers

Another option can be used to fill the index page by linking the document to an entry:

=======
session
=======
.. module:: db.session
.. index::
Database Access
Session
The module session...

Two new entries, Database Access and Session will be added in the index page.

Cross-references

Finally, Sphinx provides an inline markup to set cross-references. For instance, a link to a module can be done like this:

:mod:`db.session`

Where :mod: is the module marker's prefix and `db.session` is the name of the module to be linked to (as registered previously), keep in mind that :mod: as well as the previous elements are the specific directives introduced in reSTructuredText by Sphinx

Sphinx provides a lot more features that you can discover in its website. For instance, the autodoc feature is a great option to automatically extract your doctests to build the documentation. See http://sphinx.pocoo.org.

Summary

This series explained in detail how to:

  • Use a few rules for efficient writing
  • Use reStructuredText, the Pythonistas LaTeX
  • Build a document portfolio and landscape
  • Use Sphinx to generate nice web documentation

The hardest thing to do when documenting a project is to keep it accurate and up to date. Making the documentation part of the code repository makes it a lot easier. From there, every time a developer changes a module, he or she should change the corresponding documentation as well.

This can be quite difficult in big projects, and adding a list of related documents in the header of the modules can help in that case.

A complementary approach to make sure the documentation is always accurate is to combine the documentation with tests through doctests.

Expert Python Programming Best practices for designing, coding, and distributing your Python software
Published: September 2008
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


Tarek Ziadé

Tarek Ziadé is CTO at Ingeniweb in Paris, working on Python, Zope, and Plone technology and on Quality Assurance. He has been involved for 5 years in the Zope community and has contributed to the Zope code itself.

Tarek has also created Afpy, the French Python User Group and has written two books in French about Python. He has gave numerous talks and tutorials in French and international events like Solutions Linux, Pycon, OSCON, and EuroPython.

Contact Tarek Ziadé

Books From Packt

 

eZ Publish 4: Enterprise Web Sites Step-by-Step
eZ Publish 4: Enterprise Web Sites Step-by-Step

Apache Maven 2 Effective Implementations: RAW
Apache Maven 2 Effective Implementations: RAW

Building Enterprise Ready Telephony Systems with sipXecs 4.0: RAW
Building Enterprise Ready Telephony Systems with sipXecs 4.0: RAW

Pentaho Reporting 1.0 for Java Developers
Pentaho Reporting 1.0 for Java Developers

Scratch 1.3: Beginner’s Guide
Scratch 1.3: Beginner’s Guide

WordPress 2.7 Cookbook
WordPress 2.7 Cookbook

Asterisk 1.4 – the Professional’s Guide
Asterisk 1.4 – the Professional’s Guide

Drools JBoss Rules 5.0 Developer's Guide
Drools JBoss Rules 5.0 Developer's Guide

 

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software