One of the great difficulties in modern software development is that of dependency management. Generally, a dependency of a software project consists of a library or component that is required for the project to function correctly. In the case of a Flask application (and more generally, that of a Python application), most dependencies are comprised of specially organized and annotated source files. Once created, these packages of source files may then be included in other projects and so forth. For some, this chain of dependencies can become an unmanageable mess, where the slightest alteration to any of the libraries in the chain can cause a cascade of incompatibilities that would bring further development to a screeching halt. In the Python world, as you may know already, the fundamental unit of a reusable set of source files is that of a Python module (a file that contains definitions and statements). Once you've created a module on your local filesystem and ensured that it is in your system's PYTHONPATH, including it in a newly created project is as simple as specifying the import, which is as follows:
import the_custom_module
Where the_custom_module.py
is a file that exists somewhere in $PYTHONPATH
of the system executing the program.
Note
The $PYTHONPATH
can include paths to the compressed archives (.zip
folders) in addition to the normal file paths.
This is not where the story ends, of course. While modules littering your local filesystem might be convenient at first, what happens when you want to share some of the code that you've written for others? Usually, this would entail emailing/Dropboxing the files in question, however, this is obviously a very cumbersome and error-prone solution. Thankfully, this is a problem that has been considered and some progress has been made in alleviating the common issues. The most significant of these advances is the subject of this chapter, and how the following techniques for creating reusable, isolated packages of code can be leveraged to ease the development of a Flask application:
Python packaging with pip and setuptools
Encapsulation of virtual environments with virtualenv
The solution presented by the various Python packaging paradigms/libraries is far from perfect; one sure way to start an argument with a passionate Python developer is to proclaim that the packaging problem has been solved! We still have a long way to go for that but headway is being made in incremental steps with improvements to setuptools and various other libraries used in building, maintaining, and distributing a reusable Python code.
In this chapter, when we refer to a package, what we will actually be talking about would be succinctly described as a distribution—a bundle of software to be installed from a remote source—and not a collection of modules in a folder structure that utilizes the__init__.py
convention in order to delineate the folders containing the modules that we want to be importable.
When a developer wants to make their code more widely available, one of the first steps will be to create a setuptools-compatible package.
Most of the distributions of a modern Python version will come with setuptools already installed. If it is not present on your system of choice, then obtaining it is relatively simple, with additional instructions available on the official documentation:
wget https://bootstrap.pypa.io/ez_setup.py -O - | python
After setuptools is installed, the basic requirement to create a compatible package is the creation of a setup.py
file at the root of your project. The primary content of this file should be the invocation of a setup()
function with a few mandatory (and many optional) arguments, as follows:
from setuptools import setup setup( name="My Great Project", version="0.0.1", author="Jane Doe", author_email="jane@example.com", description= "A brief summary of the project.", license="BSD", keywords="example tutorial flask", url="http://example.com/my-great-project", packages=['foobar','tests'], long_description="A much longer project description.", classifiers=[ "Development Status :: 3 - Alpha", "Topic :: Utilities", "License :: OSI Approved :: BSD License", ], )
Tip
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Once the package has been created, most developers will choose to upload their newly minted package to PyPI—the official source of nearly all Python packages—using the built-in tools that are provided by setuptools itself. While the use of this particular public PyPI repository is not a requirement (it's even possible to set up your own personal package index), most Python developers will expect to find their packages here.
This brings us to one more essential piece of the puzzle—the pip
Python package installer. If you have Python 2.7.9 or greater installed, then pip
will already be present. Some distributions might have it preinstalled for you or it might be present in a system-level package. For a Debian-like distribution of Linux, it may be installed via the following command:
apt-get install python-pip
Similarly, other Linux-based distributions will have their own recommended package managers. If you'd rather obtain the source and install it manually, it is a simple matter of fetching a file and running it using the Python interpreter:
$ curl -o get-pip.py https://bootstrap.pypa.io/get-pip.py $ python get-pip.py
Pip is a tool for installing Python packages (and is itself a Python package). While it is not the only player in the game, pip
is by far the most widely used.
Note
The predecessor to pip
is easy_install
, which has largely been replaced in the Python community by the former. The easy_install
module suffered some relatively major problems, such as allowing partially completed installations, the inability to uninstall a package without requiring the user to manually delete the related .egg
files, and console output that contained the useful success and error messages that allowed the developer to determine the best course of action in case something went wrong.
One can invoke pip in the command line to install, say, a scientific computing package on the local filesystem:
$ pip install numpy
The preceding command will query the default PyPI index for a package named numpy
and download the latest version to a special place in your system, usually /usr/local/lib/pythonX.Y/site-packages
(X
and Y
are the major/minor versions of the Python version that pip
points to). This operation may require root privileges and would thus require sudo
or similar actions to allow it to be completed.
One of the many benefits of virtual environments, which we will explore shortly, is that they generally avoid the privilege escalation requirement that can plague system-level changes to installed packages.
Once this operation is completed successfully, you now have the ability to import the numpy
package into new modules and use any and all of the functionalities that it exposes:
import numpy x = numpy.array([1, 2, 3]) sum = numpy.sum(x) print sum # prints 6
Once we have this package (or any other, for that matter) installed, there's nothing stopping us from fetching additional packages in the usual way. Moreover, we can install multiple packages at the same time by providing their names as additional arguments to the install
command:
$ pip install scipy pandas # etc.
New developers might be tempted to install every interesting package that they come across. In doing so, they might realize that this quickly degrades into a Kafkaesque situation where previously installed packages may cease to function and newly installed packages may behave unpredictably, if they manage to get installed successfully at all. The problem with the preceding approach, as some of you may have guessed, is that of conflicting package dependencies. Say for example, we have package A
installed; it depends on version 1 of package Q
and version 1 of package R
. Package B
depends on version 2 of package R
(where versions 1 and 2 are not API-compatible). Pip will happily install package B
for you, which will upgrade package R
to version 2. This will, at best, make package A
completely unusable or, at worst, make it behave in undocumented and unpredictable ways.
The Python ecosystem has come up with a solution to the basic issues that arise from what is colloquially referred to as dependency hell. While far from perfect, it allows developers to sidestep many of the simplest package version dependency conflicts that can arise in web application development.
The virtualenv
tool, of which a similar implementation is now a default module in Python 3.3 and named venv
, is essential to ensure that you minimize your chances of ending up in dependency hell. The following quote is from the introduction in the official documentation for virtualenv
:
It creates an environment that has its own installation directories, that doesn't share libraries with other virtualenv environments (and optionally doesn't access the globally installed libraries either).
More concisely, virtualenv
allows you to create isolated environments for each one of your Python applications (or any Python code).
Note
The virtualenv
tool does not, however, help you to manage the dependencies of the Python C-based extensions. For example, if you install the lxml
package from pip
, it will require that you have the correct libxml2
and libxslt
system libraries and headers (which it will link against). The virtualenv
tool will not help you isolate these system-level libraries.
First, we need to make sure that we have the virtualenv
tool installed in our local system. This is a simple matter of fetching it from the PyPI repository:
$ pip install virtualenv
Note
For obvious reasons, this package should be installed outside any virtual environments that may already exist.
Creating a new virtual environment is straightforward. The following command will create a new folder at the specified path that will contain the necessary structure and scripts, including a full copy of your default Python binary:
$ virtualenv <path/to/env/directory>
If we want to create an environment that lives at ~/envs/testing
, we will first ensure that the parent directory exists and then invoke the following command:
$ mkdir -p ~/envs $ virtualenv ~/envs/testing
In Python 3.3+, a mostly API-compatible version of the virtualenv
tool was added to the default language packages. The name of the module is venv
, however, the name of the script that allows you to create a virtual environment is pyvenv
and can be invoked in a similar way as the previously discussed virtualenv
tool, as follows:
$ mkdir -p ~/envs $ pyvenv ~/envs/testing
Creating a virtual environment does not automatically activate it. Once the environment is created, we need to activate it so that any modifications to the Python environment (for example, installing packages) will occur in the isolated environment instead of our system global one. By default, the activation of a virtual environment will alter the prompt string ($PS1
) of the currently active user so that it displays the name of the sourced virtual environment:
$ source ~/envs/testing/bin/activate (testing) $ # Command prompt modified to display current virtualenv
The command is the same for Python 3.3+:
$ source ~/envs/testing/bin/activate (testing) $ # Command prompt modified to display current virtualenv
When you run the above command, the following series of steps occurs:
Deactivates any already activated environment.
Prepends your
$PATH
variable with the location of thevirtualenv bin/
directory, for example,~/envs/testing/bin:$PATH
.Unsets
$PYTHONHOME
if it exists.Modifies your interactive shell prompt so that it includes the name of the currently active
virtualenv
.
As a result of the $PATH
environment variable manipulations, the Python and pip
binaries (and whatever other binaries that were installed via pip
), which have been invoked via the shell where the environment was activated, will be the ones contained in ~/envs/testing/bin
.
We can easily add packages to a virtual environment by simply activating it and then invoking pip
in the following way:
$ source ~/envs/testing/bin/activate (testing)$ pip install numpy
This will install the numpy
package to the testing environment, and only the testing environment. Your global system packages will be unaffected, as well as any other existing environments.
Uninstalling a pip
package is straightforward as well:
$ source ~/envs/testing/bin/activate (testing)$ pip uninstall numpy
This will remove the numpy
package from the testing environment only.
Here is one relatively major place where the Python package management falls short: uninstalling a package does not uninstall its dependencies. For example, if you install package A
and it installs dependent packages B
and C
, uninstalling package A
at a later time will not uninstall B
and C
.
A tool that I use frequently is virtualenvwrapper
, which is a very small set of smart defaults and command aliases that makes working with virtual environments more intuitive. Let's install this to our global system now:
$ pip install virtualenvwrapper
Next, you'll want to add the following lines to the end of your shell startup file. This is most likely ~/.bashrc
, but in case you've changed your default shell to something else such as zsh
, then it could be different (for example, ~/.zshrc
):
export WORKON_HOME=$HOME/.virtualenvs source /usr/local/bin/virtualenvwrapper.sh
The first line in the preceding code block indicates that new virtual environments that are created with virtualenvwrapper
should be stored in $HOME/.virtualenvs
. You can modify this as you see fit, but I generally leave this as a good default. I find that keeping all my virtual environments in the same hidden folder in my home directory reduces the amount of clutter in individual projects and makes it a bit more difficult to mistakenly add a whole virtual environment to version control.
Note
Adding an entire virtual environment to version control might seem like a good idea, but things are never as simple as they seem. The moment someone running a slightly (or completely) different operating system decides to download your project, which includes a full virtualenv
folder that may contain packages with C
modules that were compiled against your own architecture, they're going to have a hard time getting things to work.
Instead, a common pattern that is supported by pip and used by many developers is to freeze the current state of the installed packages in a virtual environment and save this to a requirements.txt
file:
(testing) $ pip freeze > requirements.txt
This file may then be added to a version control system (VCS). As the intent of the file is to declare which dependencies are required for the application, and not provide them or indicate how they should be constructed, users of your project are then free to obtain the required packages in any way they so choose. Generally, they will install them via pip
, which can handle a requirements file just fine:
(testing) $ pip install –r requirements.txt
The second line adds a few convenient aliases to your current shell environment in order to create, activate, switch, and remove environments:
mkvirtualenv test
: This will create an environment named test and activate it automatically.mktmpenv test
: This will create a temporary environment named test and activate it automatically. This environment will be destroyed once you invoke the deactivate script.workon app
: This will switch you to the app environment (already created).workon
(alias lsvirtualenv
): When you don't specify an environment, this will print all the existing environments that are available.deactivate
: This will disable the currently active environment, if any.rmvirtualenv app
: This will completely remove the app environment.
We'll use the following command to create an environment to install our application packages:
$ mkvirtualenv app1
This will create a blank app1
environment and activate it. You should see an (app1
) tag in your shell prompt.
Note
If you are using a shell other than Bash or ZSH, this environment tag may or may not appear. The way in which this works is that the script that is used to activate the virtual environment also modifies your current prompt string (the PS1
environment variable) so that it indicates the currently active virtualenv
. As a result, there is a chance that this may not work if you're using a very special or non-standard shell configuration.
In this chapter, we looked at one of the most fundamental problems that any non-trivial Python application faces: library dependency management. Thankfully, the Python ecosystem has developed the widely adopted virtualenv
tool for solving the most common subset of dependency problems that developers may encounter.
Additionally, we looked at a tool, virtualenvwrapper
, that abstracted away some of the most common operations that one would perform with virtualenv
. While we listed some of the functionalities that this package provided, the list of things that virtualenvwrapper
can do is much more extensive. We only presented the very basics here, but more in-depth learning about what this tool can do is indispensable if you work with Python virtual environments all day long.