Python Data Visualization Cookbook

4.6 (7 reviews total)
By Igor Milovanović
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Preparing Your Working Environment

About this book

Today, data visualization is a hot topic as a direct result of the vast amount of data created every second. Transforming that data into information is a complex task for data visualization professionals, who, at the same time, try to understand the data and objectively transfer that understanding to others. This book is a set of practical recipes that strive to help the reader get a firm grasp of the area of data visualization using Python and its popular visualization and data libraries.

Python Data Visualization Cookbook will progress the reader from the point of installing and setting up a Python environment for data manipulation and visualization all the way to 3D animations using Python libraries. Readers will benefit from over 60 precise and reproducible recipes that guide the reader towards a better understanding of data concepts and the building blocks for subsequent and sometimes more advanced concepts.

Python Data Visualization Cookbook starts by showing you how to set up matplotlib and the related libraries that are required for most parts of the book, before moving on to discuss some of the lesser-used diagrams and charts such as Gantt Charts or Sankey diagrams. During the book, we go from simple plots and charts to more advanced ones, thoroughly explaining why we used them and how not to use them. As we go through the book, we will also discuss 3D diagrams. We will peep into animations just to show you what it takes to go into that area. Maps are irreplaceable for displaying geo-spatial data, so we also show you how to build them. In the last chapter, we show you how to incorporate matplotlib into different environments, such as a writing system, LaTeX, or how to create Gantt charts using Python.

This book will help those who already know how to program in Python to explore a new field – one of data visualization. As this book is all about recipes that explain how to do something, code samples are abundant, and they are followed by visual diagrams and charts to help you understand the logic and compare your own results with what is explained in the book.

Publication date:
November 2013
Publisher
Packt
Pages
280
ISBN
9781782163367

 

Chapter 1. Preparing Your Working Environment

In this chapter, we will cover the following recipes:

  • Installing matplotlib, NumPy, and SciPy

  • Installing virtualenv and virtualenvwrapper

  • Installing matplotlib on Mac OS X

  • Installing matplotlib on Windows

  • Installing Python Imaging Library (PIL) for image processing

  • Installing a requests module

  • Customizing matplotlib's parameters in code

  • Customizing matplotlib's parameters per project

 

Introduction


This chapter introduces the reader to the essential tooling and installation and configuration of them. This is a necessary work and common base for the rest of the book. If you have never used Python for data and image processing and visualization, it is advised not to skip this chapter. Even if you do skip it, you can always return to this chapter in case you need to install some supporting tool or verify what version you need to support the current solution.

 

Installing matplotlib, NumPy, and SciPy


This chapter describes several ways of installing matplotlib and required dependencies under Linux.

Getting ready

We assume that you already have Linux (preferably Debian/Ubuntu or RedHat/SciLinux) installed and Python installed on it. Usually, Python is already installed on the mentioned Linux distributions and, if not, it is easily installable through standard means. We assume that Python 2.7+ Version is installed on your workstation.

Note

Almost all code should work with Python 3.3+ Versions, but because most operating systems still deliver Python 2.7 (some even Python 2.6) we decided to write the Python 2.7 Version code. The differences are small, mainly in version of packages and some code (xrange should be substituted with range in Python 3.3+).

We also assume that you know how to use your OS package manager in order to install software packages and know how to use a terminal.

Build requirements must be satisfied before matplotlib can be built.

matplotlib requires NumPy, libpng, and freetype as build dependencies. In order to be able to build matplotlib from source, we must have installed NumPy. Here's how to do it:

Install NumPy (at least 1.4+, or 1.5+ if you want to use it with Python 3) from http://www.numpy.org/.

Note

NumPy will provide us with data structures and mathematical functions for using it with large datasets. Python's default data structures such as tuples, lists, or dictionaries are great for insertions, deletions, and concatenation. NumPy's data structures support "vectorized" operations and are very efficient for use and for executions. They are implemented with Big Data in mind and rely on C implementations that allow efficient execution time.

SciPy, building on top of NumPy, is the de facto standard's scientific and numeric toolkit for Python comprising great selection of special functions and algorithms, most of them actually implemented in C and Fortran, coming from the well-known Netlib repository (see http://www.netlib.org).

Perform the following steps for installing NumPy:

  1. Install Python-NumPy package:

    $ sudo apt-get install python-numpy
    
  2. Check the installed version:

    $ python -c 'import numpy; print numpy.__version__'
    
  3. Install the required libraries:

    • libpng 1.2: PNG files support (requires zlib)

    • freetype 1.4+: True type font support

      	$ sudo apt-get install build-dep python-matplotlib
       

    If you are using RedHat or variation of this distribution (Fedora, SciLinux, or CentOS) you can use yum to perform same installation:

        $ su -c 'yum-builddep python-matplotlib'
        

How to do it...

There are many ways one can install matplotlib and its dependencies: from source, from precompiled binaries, from OS package manager, and with prepackaged python distributions with built-in matplotlib.

Most probably the easiest way is to use your distribution's package manager. For Ubuntu that should be:

# in your terminal, type:
$ sudo apt-get install python-numpy python-matplotlib python-scipy

If you want to be on the bleeding edge, the best option is to install from source. This path comprises a few steps: Get the source, build requirements, and configure, compile, and install.

Download the latest source from code host www.github.com by following these steps:

$ cd ~/Downloads/
$ wget https://github.com/downloads/matplotlib/matplotlib/matplotlib-1.2.0.tar.gz
$ tar xzf matplotlib-1.2.0.tar.gz
$ cd matplotlib-1.2.0
$ python setup.py build
$ sudo python setup.py install

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

How it works...

We use standard Python Distribution Utilities, known as Distutils, to install matplotlib from source code. This procedure requires us to previously install dependencies, as we already explained in the Getting ready section of this recipe. The dependencies are installed using the standard Linux packaging tools.

There's more...

There are more optional packages that you might want to install depending on what your data visualization projects are about.

No matter what project you are working on, we recommend installing IPython—an Interactive Python shell that supports PyLab mode where you already have matplotlib and related packages, such as NumPy and SciPy, imported and ready to play with! Please refer to IPython's official site on how to install it and use it—it is, though, very straightforward.

 

Installing virtualenv and virtualenvwrapper


If you are working on many projects simultaneously, or even just switching between them frequently, you'll find that having everything installed system-wide is not the best option and can bring problems in future on different systems (production) where you want to run your software. This is not a good time to find out that you are missing a certain package or have versioning conflicts between packages that are already installed on production system; hence, virtualenv.

virtualenv is an open source project started by Ian Bicking that enables a developer to isolate working environments per project, for easier maintenance of different package versions.

For example, you inherited legacy Django website based on Django 1.1 and Python 2.3, but at the same time you are working on a new project that must be written in Python 2.6. This is my usual case—having more than one required Python version (and related packages) depending on the project I am working on.

virtualenv enables me to easily switch to different environments and have the same package easily reproduced if I need to switch to another machine or to deploy software to a production server (or to a client's workstation).

Getting ready

To install virtualenv, you must have workable installation of Python and pip. Pip is a tool for installing and managing Python packages, and it is a replacement for easy install. We will use pip through most of this book for package management. Pip is easily installed, as root executes the following line in your terminal:

# easy_install pip

virtualenv by itself is really useful, but with the help of virtualenvwrapper, all this becomes easy to do and also easy to organize many virtual environments. See all the features at http://virtualenvwrapper.readthedocs.org/en/latest/#features.

How to do it...

By performing the following steps you can install the virtualenv and virtualenvwrapper tools:

  1. Install virtualenv and virtualenvwrapper:

    $ sudo pip virtualenv
    $ sudo pip virtualenvwrapper
    # Create folder to hold all our virtual environments and export the path to it.
    $ export VIRTENV=~/.virtualenvs
    $ mkdir -p $VIRTENV
    # We source (ie. execute) shell script to activate the wrappers
    $ source /usr/local/bin/virtualenvwrapper.sh
    # And create our first virtual environment
    $ mkvirtualenv virt1
    
  2. You can now install our favorite package inside virt1:

    (virt1)user1:~$ pip install matplotlib
    
  3. You will probably want to add the following line to your ~/.bashrc file:

    source /usr/loca/bin/virtualenvwrapper.sh

Few useful and most frequently used commands are as follows:

  • mkvirtualenv ENV: This creates virtual environment with name ENV and activates it

  • workon ENV: This activates the previously created ENV

  • deactivate: This gets us out of the current virtual environment

 

Installing matplotlib on Mac OS X


The easiest way to get matplotlib on Mac OS X is to use prepackaged python distributions such as Enthought Python Distribution (EPD). Just go to the EPD site and download and install the latest stable version for your OS.

In case you are not satisfied with EPD or cannot use it for other reasons such as versions distributed with it, there is a manual (read: harder) way of installing Python, matplotlib, and its dependencies.

Getting ready

We will use the Homebrew project that eases installation of all software that Apple did not install on your OS, including Python and matplotlib. Under the hood, Homebrew is a set of Ruby and Git that automate download and installation. Following these instructions should get the installation working. First, we will install Homebrew, and then Python, followed by tools such as virtualenv, then dependencies for matplotlib (NumPy and SciPy), and finally matplotlib. Hold on, here we go.

How to do it...

  1. In your Terminal paste and execute the following command:

    ruby <(curl -fsSkL raw.github.com/mxcl/homebrew/go)
    

    After the command finishes, try running brew update or brew doctor to verify that installation is working properly.

  2. Next, add the Homebrew directory to your system path, so the packages you install using Homebrew have greater priority than other versions. Open ~/.bash_profile (or /Users/[your-user-name]/.bash_profile) and add the following line to the end of file:

    export PATH=/usr/local/bin:$PATH
    
  3. You will need to restart the terminal so it picks a new path. Installing Python is as easy as firing up another one-liner:

    brew install python --framework --universal
    

    This will also install any prerequisites required by Python.

  4. Now, you need to update your path (add to the same line):

    export PATH=/usr/local/share/python:/usr/local/bin:$PATH
    
  5. To verify that installation worked, type python --version at the command line, you should see 2.7.3 as the version number in the response.

  6. You should have pip installed by now. In case it is not installed, use easy_install to add pip:

    $ easy_install pip
    
  7. Now, it's easy to install any required package; for example, virtualenv and virtualenvwrapper are useful:

    pip install virtualenv
    pip install virtualenvwrapper
    
  8. Next step is what we really wanted to do all along—install matplotlib:

    pip install numpy
    brew install gfortran
    pip install scipy
    

    Note

    Mountain Lion users will need to install the development version of SciPy (0.11) by executing the following line:

    pip install -e git+https://github.com/scipy/scipy#egg=scipy-dev
    
  9. Verify that everything is working. Call Python and execute the following commands:

    import numpy
    print numpy.__version__
    import scipy
    print scipy.__version__
    quit()
    
  10. Install matplotlib:

    pip install matplotlib
    
 

Installing matplotlib on Windows


In this recipe, we will demonstrate how to install Python and start working with matplotlib installation. We assume Python was not previously installed.

Getting ready

There are two ways of installing matplotlib on Windows. The easier way is by installing prepackaged Python environments such as EPD, Anaconda and Python(x,y). This is the suggested way to install Python, especially for beginners.

The second way is to install everything using binaries of precompiled matplotlib and required dependencies. This is more difficult as you have to be careful about the versions of NumPy and SciPy you are installing, as not every version is compatible with the latest version of matplotlib binaries. The advantage in this is that you can even compile your particular versions of matplotlib or any library as to have the latest features, even if they are not provided by authors.

How to do it...

The suggested way of installing free or commercial Python scientific distributions is as easy as following the steps provided on the project's website.

If you just want to start using matplotlib and don't want to be bothered with Python versions and dependencies, you may want to consider using the Enthought Python Distribution (EPD). EPD contains prepackaged libraries required to work with matplotlib and all the required dependencies (SciPy, NumPy, IPython, and more).

As usual, we download Windows Installer (*.exe) that will install all the code we need to start using matplotlib and all recipes from this book.

There is also a free scientific project Python(x,y) (http://code.google.com/p/pythonxy/) for Windows 32-bit system that contains all dependencies resolved, and is an easy (and free!) way of installing matplotlib on Windows. Because Python(x,y) is compatible with Python modules installers, it can be easily extended with other Python libraries. No Python installation should be present on the system before installing Python(x,y).

Let me shortly explain how we would install matplotlib using precompiled Python, NumPy, SciPy, and matplotlib binaries. First, we download and install standard Python using official MSI Installer for our platform (x86 or x86-64). After that, download official binaries for NumPy and SciPy and install them first. When you are sure that NumPy and SciPy are properly installed, then we download the latest stable release binary for matplotlib and install it by following the official instructions.

There's more...

Note that many examples are not included in the Windows installer. If you want to try the demos, download the matplotlib source and look in the examples subdirectory.

 

Installing Python Imaging Library (PIL) for image processing


Python Imaging Library (PIL) enables image processing using Python, has an extensive file format support, and is powerful enough for image processing.

Some popular features of PIL are fast access to data, point operations, filtering, image resizing, rotation, and arbitrary affine transforms. For example, the histogram method allows us to get statistics about the images.

PIL can also be used for other purposes, such as batch processing, image archiving, creating thumbnails, conversion between image formats, and printing images.

PIL reads a large number of formats, while write support is (intentionally) restricted to the most commonly used interchange and presentation formats.

How to do it...

The easiest and most recommended way is to use your platform's package managers. For Debian/Ubuntu use the following commands:

$ sudo apt-get build-dep python-imaging
$ sudo pip install http://effbot.org/downloads/Imaging-1.1.7.tar.gz

How it works...

This way we are satisfying all build dependencies using apt-get system but also installing the latest stable release of PIL. Some older versions of Ubuntu usually don't provide the latest releases.

On RedHat/SciLinux:

# yum install python-imaging
# yum install freetype-devel
# pip install PIL

There's more...

There is a good online handbook, specifically, for PIL. You can read it at http://www.pythonware.com/library/pil/handbook/index.htm, or download the PDF version from http://www.pythonware.com/media/data/pil-handbook.pdf.

There is also a PIL fork, Pillow, whose main aim is to fix installation issues. Pillow can be found at http://pypi.python.org/pypi/Pillow and it is easy to install.

On Windows, PIL can also be installed using a binary installation file. Install PIL in your Python site-packages by executing .exe from http://www.pythonware.com/products/pil/.

Now, if you want PIL used in virtual environment, manually copy the PIL.pth file and the PIL directory at C:\Python27\Lib\site-packages to your virtualenv site-packages directory.

 

Installing a requests module


Most of the data that we need now is available over HTTP or similar protocol, so we need something to get it. Python library requests makes that job easy.

Even though Python comes with the urllib2 module for work with remote resources and supporting HTTP capabilities, it requires a lot of work to get the basic tasks done.

Requests module brings new API that makes the use of web services seamless and pain free. Lot of the HTTP 1.1 stuff is hidden away and exposed only if you need it to behave differently than default.

How to do it...

Using pip is the best way to install requests. Use the following command for the same:

$ pip install requests

That's it. This can also be done inside your virtualenv if you don't need requests for every project or want to support different requests versions for each project.

Just to get you ahead quickly, here's a small example on how to use requests:

import requests
r = requests.get('http://github.com/timeline.json')
print r.content

How it works...

We sent the GET HTTP request to a URI at www.github.com that returns a JSON-formatted timeline of activity on GitHub (you can see HTML version of that timeline at https://github.com/timeline). After response is successfully read, the r object contains content and other properties of the response (response code, cookies set, header metadata, even the request we sent in order to get this response).

 

Customizing matplotlib's parameters in code


The Library we will use the most throughout this book is matplotlib; it provides the plotting capabilities. Default values for most properties are already set inside the configuration file for matplotlib, called.rc file. This recipe describes how to modify matplotlib properties from our application code.

Getting ready

As we already said, matplotlib configuration is read from a configuration file. This file provides a place to set up permanent default values for certain matplotlib properties, well, for almost everything in matplotlib.

How to do it...

There are two ways to change parameters during code execution: using the dictionary of parameters (rcParams) or calling the matplotlib.rc() command. The former enables us to load already existing dictionary into rcParams, while the latter enables a call to a function using tuple of keyword arguments.

If we want to restore the dynamically changed parameters, we can use matplotlib.rcdefaults() call to restore the standard matplotlib settings.

The following two code samples illustrate previously explained behaviors:

Example for matplotlib.rcParams:

import matplotlib as mp
mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.color'] = 'r'

Example for the matplotlib.rc() call:

import matplotlib as mpl
mpl.rc('lines', linewidth=2, color='r')

Both examples are semantically the same. In the second sample, we define that all subsequent plots will have lines with line width of 2 points. The last statement of the previous code defines that the color of every line following this statement will be red, unless we override it by local settings. See the following example:

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 1.0, 0.01)

s = np.sin(2 * np.pi * t)
# make line red
plt.rcParams['lines.color'] = 'r'
plt.plot(t,s)

c = np.cos(2 * np.pi * t)
# make line thick
plt.rcParams['lines.linewidth'] = '3
plt.plot(t,c)

plt.show()

How it works...

First, we import matplotlib.pyplot and NumPy to allow us to draw sine and cosine graphs. Before plotting the first graph, we explicitly set line color to red using plt.rcParams['lines.color'] = 'r'.

Next, we go to the second graph (cosine function), and explicitly set line width to 3 points using plt.rcParams['lines.linewidth'] = '3'.

If we want to reset specific settings, we should call matplotlib.rcdefaults().

 

Customizing matplotlib's parameters per project


This recipe explains where the various configuration files are that matplotlib uses, and why we want to use one or the other. Also, we explain what is in these configuration files.

Getting ready

If you don't want to configure matplotlib as the first step in your code every time you use it (as we did in the previous recipe), this recipe will explain how to have different default configurations of matplotlib for different projects. This way your code will not be cluttered with configuration data and, moreover, you can easily share configuration templates with your co-workers or even among other projects.

How to do it...

If you have a working project that always uses the same settings for certain parameters in matplotlib, you probably don't want to set them every time you want to add a new graph code. Instead, what you want is a permanent file, outside of your code, which sets defaults for matplotlib parameters.

matplotlib supports this via its matplotlibrc configuration file that contains most of the changeable properties of matplotlib.

How it works...

There are three different places where this file can reside and its location defines its usage. They are:

  • Current working directory: This is where your code runs from. This is the place to customize matplotlib just for your current directory that might contain your current project code. File is named matplotlibrc.

  • Per user .matplotlib/matplotlibrc: This is usually in user's $HOME directory (under Windows, this is your Documents and Settings directory). You can find out where your configuration directory is using the matplotlib.get_configdir() command. Check the next command.

  • Per installation configuration file: This is usually in your python site-packages. This is a system-wide configuration, but it will get overwritten every time you reinstall matplotlib; so it is better to use per user configuration file for more persistent customizations. Best usage so far for me was to use this as a default template if I mess up my user's configuration file or if I need fresh configuration to customize for a different project.

The following one-liner will print the location of your configuration directory and can be run from shell.

$ python -c 'import matplotlib as mpl; print mpl.get_configdir()'

The configuration file contains settings for:

  • axes: Deals with face and edge color, tick sizes, and grid display.

  • backend: Sets the target output: TkAgg and GTKAgg.

  • figure: Deals with dpi, edge color, figure size, and subplot settings.

  • font: Looks at font families, font size, and style settings.

  • grid: Deals with grid color and line settings.

  • legend: Specifies how legends and text inside will be displayed.

  • lines: It checks for line (color, style, width, and so on) and markers settings.

  • patch: Patches are graphical objects that fill 2D space, such as polygons and circles; set linewidth, color, antialiasing, and so on.

  • savefig: There are separate settings for saved figures. For example, to make rendered files with a white background.

  • text: This looks for text color, how to interepret text (plain versus latex markup) and similar.

  • verbose: It checks how much information matplotlib gives during runtime: silent, helpful, debug, and debug-annoying.

  • xticks and yticks: These set the color, size, direction, and labelsize for major and minor ticks for x and y axes.

There's more...

If you are interested in more details for every mentioned setting (and some that we did not mention here), the best place to go is the website of matplotlib project where there is up-to-date API documentation. If it doesn't help, user and development lists are always good places to leave questions. See the back of this book for useful online resources.

About the Author

  • Igor Milovanović

    Igor Milovanović is an experienced developer, with strong background in Linux system knowledge and software engineering education. He is skilled in building scalable data-driven distributed software rich systems.

    An evangelist for high-quality systems design, he has a strong interest in software architecture and development methodologies. Igor is always committed to advocating methodologies that promote high-quality software, such as test-driven development, one-step builds, and continuous integration.

    He also possesses solid knowledge of product development. With field experience and official training, he is capable of transferring knowledge and communication flow from business to developers and vice versa.

    Igor is most grateful to his girlfriend for letting him spend hours on work instead with her and being an avid listener to his endless book monologues. He thanks his brother for being the strongest supporter. He is also thankful to his parents for letting him develop in various ways to become a person he is today.

    Browse publications by this author

Latest Reviews

(7 reviews total)
Jednoduché napupování, snadno se hledná i objednává. Velké množství titulů.
a little expensive,and some book not included.
Python Data Visualization Cookbook
Unlock this book and the full library FREE for 7 days
Start now