Packaging a Python Project using doit

January 2010

The article won't attempt to reproduce doit documentation, but will explain how it could be used to solve a specific problem in a practical way. For a complete introduction of doit, and a description of all its features, please refer to the project documentation. Debian packaging or bazaar knowledge isn't required to follow the discussion, but it would be helpful.


When working on a project's source code, a developer usually needs to perform different repetitive administrative tasks that are required to compile, test and distribute the source code. In general, those tasks are pretty similar from project to project; although, the details may greatly vary depending on the application type, target platform, software development cycle, etc. As a consequence, the implementation of custom scripts that automate them is needed as a part of the maintenance of the source code.

Given that, this is a very common problem, many task automation tools have been created, make is one of the most well-known among them and is used as a reference to compare with other similar tools. As the reader probably knows, make provides an imperative way to automate small tasks by defining in a file (a makefile) a series of rules that have a target file, multiple dependency files and a set of commands. To reach a given target, make must ensure that target file isn't outdated and that all the dependency files are present before running the commands that will generate the target file. During this process, the evaluation of other rules might be needed to fulfill the required dependencies.

Although this approach may look simple, it has been really successful in many projects for years. However, since it tries to solve a general problem, it doesn't perfectly fit in every situation. This fact has led to the creation of similar tools that attempt to address some of the drawbacks of make.:

  • The makefile format forces the developer to learn a new mini-language.
  • Rules are statically defined.
  • Just one target file per rule is allowed.

With the advent of dynamic programming languages, a new generation of make-like tools that solved those issues were designed. For example, rake did a really good job in providing a familiar environment for ruby developers who wanted to use an advanced task automation tool without having to learn something new other than an API.

With regard to python developers, many of these tools are currently available for them with different goals in their designs. One that I find particularly interesting is doit because it doesn't have any of the make problems listed above. In particular:

  • It's really simple to use because it uses python itself as the language to write the configuration statements needed.
  • Tasks, the equivalent to make's rules, may have as many targets as needed, which makes things simpler when the execution of a command entails the creation of multiple files.
  • Task themselves aren't defined in the configuration, but task generators. This is really flexible when dependencies and/or targets depend on variables that need to be evaluated at run time.

The problem

Let's imagine that we are working on checkbox-editor, a simple python project hosted in Launchpad that provides an easy GTK interface to write test cases for checkbox. The way the application is delivered to users is by means of .deb packages for the latest Ubuntu distributions in a Personal Package Archive or PPA, so we'd like to be able to:

  • Package the application at any time.
  • Install the package locally for testing.
  • Upload the package automatically to a PPA.

Fortunately, the project's trunk branch already has the configuration files needed to generate a .deb package using the usual set of tools, so we're going to focus on the process of writing the file needed to generate and upload the desired packages. Of course, since we don't like to waste our time, we only want to generate the files needed for packaging when necessary; that's is to say, we're going to follow make's approach of generating target files only when they aren't up-to-date.


In this section, a file that contains the tasks generators, which are required to automate the package generation using doit, will be created step by step. The same way as a makefile is created with all the rules for make; in doit, the default file name with the task generators is Of course, another file name can be used by passing an argument to doit, but we'll stick to the usual name in this example.

In the code snippets that will be displayed in the following sections, some global variables will be used mainly to get the name of some files. For now, just assume that they're available in the task generators methods. The code that calculates those variables value will be shown at the end of the article.


There are two different classes of packages: source and binary ones.

Binary packages are the ones that are compiled for an specific architecture, and that are expected to be installed directly into the destination hardware without any problem. These are the type of packages that we need to generate to accomplish the goal of installing a package locally for testing purposes. Hence, two of the tasks that we need to automate are the generation of the binary package and it's installation.

Source packages are useful to distribute the source code of an application in a platform independent way, so that anyone can take a look at the code, fix it or compile it for another architecture if needed. This is also the package that must be uploaded to a Launchpad PPA, since it will take care to compile it for different architectures and publish the binary packages for them. Consequently, two more tasks that should be automated are the generation of a source package and the upload to the Launchpad PPA.

Before creating any package is generated, we also need to generate a copy of the source code with the latest changes. This is not absolutely needed; but it's advised since the package generation process creates some temporary files. The diagram of the tasks that have just been identified is the following:

Packaging a Python Project using doit

Tasks that should be automated


The first task before any package generation is copying the source code to a new directory (for example, pkg), to keep the development directory clean from the temporary files created during the packaging process.

The code that implements this task is as follows:

 1 def task_code():
2 """
3 Create package directory and copy source files
4 (bzr branch not used to take into account uncommited changes)
5 """
6 def copy_file(from_file, to_file):
7 dir = os.path.dirname(to_file)
8 if dir and not os.path.isdir(dir):
9 os.makedirs(dir)
10 print from_file, '=>', to_file
11 shutil.copyfile(from_file, to_file)
12 shutil.copystat(from_file, to_file)
13 return True
15 yield {'name': 'clean',
16 'actions': None,
17 'clean': ['rm -rf pkg']}
19 for target, dependency in zip(PKG_FILES, SRC_FILES):
20 yield {'name': dependency,
21 'actions': [(copy_file, (dependency, target))],
22 'dependencies': [dependency],
23 'targets': [target]}

where the following principles have been applied:

  • With doit tasks aren't written, but task generators that return dictionary objects (lines 15-17 and 20-23) that are used internally by doit to create the real Task objects under the hood. This provides the ability to use variables, such as PKG_FILES and SRC_FILES, which are calculated when the module is imported.


  • doit task dictionaries are easy to understand for a python developer, they must contain at least the following:



    Files generated as a result of the task invocation.


    Commands (either python functions with arguments or strings to be passed to a shell) to be executed when any of the target files is missing or outdated.


    Files that must exist or other task names that must be satisfied before executing the task actions.

  • A big task can be split into multiple subtasks using python's yield statement. Keep in mind that we're using task generators so the python generator concept maps smoothly into doit.


In particular:

  • In lines 15-17, an example of a dummy task is shown. It doesn't perform any action (since actions value is None) when the source task is invoked, but it contains the statements needed to clean the files created by this task. In this case, removing the pkg directory where the packages will be created.
  • In lines 20-23 the original is decomposed into multiple subtasks. The reason for this is to follow the requirement to generate target files only when needed. If we used a single task to copy all source files, every update to a single source file would entail the copy of all those files to the pkg directory. However, with subtasks, only one file would be copied since only one subtask would be outdated. As a general advice, if you're unsure about when to split a task, just try to update one of the dependencies, run the desired task and check if some of the work wasn't need really needed.

If you have the code for the project already downloaded (or branched with bzr branch) and a copy of the complete (see complete code section), you can execute the code task this way:

$ doit code
. code:bin/checkbox-editor
. code:checkbox_editor/

where part of the output has been suppressed. As you can see when a task is executed, the name of the task and a dot is printed. If you try to execute the same task again, you'll obtain the following:

$ doit code
-- code:bin/checkbox-editor
-- code:checkbox_editor/

where the double dash means that the task has been skipped because it's already up-to-date.

From here you can play modifying dependencies, removing targets, running doit clean <task_name> to see when a task is executed and when it's skipped.


Once a copy of the source code is available under the pkg directory, the creation of a source package is easy with the debuild command:

 1 def task_source():
2 """
3 Create source package from source distribution
4 """
5 return {'actions': ['cd pkg/%s && debuild -S' % FULL_NAME],
6 'dependencies': PKG_FILES,
7 'targets': SRC_PKG_FILES,
8 'clean': True}


  • source task depends on the files generated by the code task (line 6), that is to say, any change in the source files will entail both code and source to be executed when needed.
  • multiple targets (line 7) have been stated in a single task dictionary, thus overcoming one of the make unsupported features in an easy way.
  • clean (line 8) has been set to True to state that to clean the effects of this task, just the target files should be removed.


The upload task could be implemented using the dput command as follows:

 1 def task_upload():
2 """
3 Upload source package to PPA
4 """
5 return {'actions': ['cd pkg && dput %s %s_%s*_source.changes' % (PPA, NAME, VERSION)],
6 'dependencies': SRC_PKG_FILES}

where the dependencies of the task are the targets of the previous task. This is a usual pattern when writing files.


Similarly to source task, binary task generates package files based on the files generated by the code task as shown below:

 1 def task_binary():
2 """
3 Create binary package from the source distribution
4 """
5 return {'actions': ['cd pkg/%s && debuild -b -uc -us' % FULL_NAME],
6 'dependencies': PKG_FILES,
7 'targets': BIN_PKG_FILES,
8 'clean': True}

The only difference is that the parameters for the debuild command must be different to generate the binary package.


This task installs locally the generated binary package using the debi command.

 1 def task_install():
2 """
3 Install binary package locally
4 """
5 return {'actions': ['cd pkg/%s; gksudo debi' % FULL_NAME],
6 'dependencies': BIN_PKG_FILES}

Once again, the dependencies for this task are the targets for the previous task in the diagram that was displayed in the identification section.

Complete code

The complete source code for the module in charge of packaging the source code as needed is the following:

 1 """
2 doit file to generate source and binary packages automatically
3 """
4 import os, shutil, re, subprocess
5 import platform
7 def changelog_version(changelog="debian/changelog"):
8 version = "dev"
9 if os.path.isfile(changelog):
10 head=open(changelog).readline()
11 match = re.compile(".*((.*)).*").match(head)
12 if match:
13 version =
15 return version
18 def get_pkg_files(package_name):
19 src_files = subprocess.Popen("bzr ls -R -V",
20 stdout = subprocess.PIPE,
21 shell = True).communicate()[0].splitlines()
22 src_files = [src_file for src_file in src_files
23 if os.path.isfile(src_file)]
25 pkg_files = [os.path.join("pkg", package_name, src_file)
26 for src_file in src_files]
28 return src_files, pkg_files
30 # Project specific configuration
31 NAME = "checkbox-editor"
32 PPA = "oem-community-qa"
34 # Global variables
35 VERSION = changelog_version()
36 FULL_NAME = "%s-%s" % (NAME, VERSION)
37 SRC_FILES, PKG_FILES = get_pkg_files(FULL_NAME)
38 machine2pkg = {'x86_64': 'amd64',
39 'i686': 'i386'}
40 BIN_PKG_FILES = ['pkg/%s_%s_%s.changes' % (NAME, VERSION,
41 machine2pkg[platform.machine()]),
42 'pkg/' % (NAME, VERSION,
43 machine2pkg[platform.machine()]),
44 'pkg/%s_%s_all.deb' % (NAME, VERSION),]
45 SRC_PKG_FILES = ["pkg/%s_%s.dsc" % (NAME, VERSION),
46 "pkg/%s_%s_source.changes" % (NAME, VERSION),
47 "pkg/" % (NAME, VERSION),
48 "pkg/%s_%s.tar.gz" % (NAME, VERSION),]
50 # Doit configuration
51 DEFAULT_TASKS = ['source', 'binary']
53 def task_code():
54 """
55 Create package directory and copy source files
56 (bzr branch not used to take into account uncommited changes)
57 """
58 def copy_file(from_file, to_file):
59 dir = os.path.dirname(to_file)
60 if dir and not os.path.isdir(dir):
61 os.makedirs(dir)
62 print from_file, '=>', to_file
63 shutil.copyfile(from_file, to_file)
64 shutil.copystat(from_file, to_file)
65 return True
67 yield {'name': 'clean',
68 'actions': None,
69 'clean': ['rm -rf pkg']}
71 for target, dependency in zip(PKG_FILES, SRC_FILES):
72 yield {'name': dependency,
73 'actions': [(copy_file, (dependency, target))],
74 'dependencies': [dependency],
75 'targets': [target]}
78 def task_source():
79 """
80 Create source package from source distribution
81 """
82 return {'actions': ['cd pkg/%s && debuild -S' % FULL_NAME],
83 'dependencies': PKG_FILES,
84 'targets': SRC_PKG_FILES,
85 'clean': True}
88 def task_upload():
89 """
90 Upload source package to PPA
91 """
92 return {'actions': ['cd pkg && dput %s %s_%s*_source.changes' % (PPA, NAME, VERSION)],
93 'dependencies': SRC_PKG_FILES}
96 def task_binary():
97 """
98 Create binary package from the source distribution
99 """
100 return {'actions': ['cd pkg/%s && debuild -b -uc -us' % FULL_NAME],
101 'dependencies': PKG_FILES,
102 'targets': BIN_PKG_FILES,
103 'clean': True}
106 def task_install():
107 """
108 Install binary package locally
109 """
110 return {'actions': ['cd pkg/%s; gksudo debi' % FULL_NAME],
111 'dependencies': BIN_PKG_FILES}


  • Some initialization code is needed to get the version number of the package (lines 7-15 and 35) and the list of source files under version control (lines 18-28 and 37).
  • Default tasks (line 51) have been defined for the case in which no task name has been passed as argument.


This article has shown:

  • How doit could be used as an make-like tool to automate tasks using a python friendly syntax.
  • Some of the make drawbacks have been solved by this tool.
  • A repetitive task such as the packaging example, can be automated without too much complexity.

You've been reading an excerpt of:

Python Testing: Beginner's Guide

Explore Title