Over the past 12 years of its existence, matplotlib has made its way into the classrooms, labs, and hearts of the scientific computing world. With Python's rise in popularity for serious professional and academic work, matplotlib has taken a respected seat beside long-standing giants such as Mathematica by Wolfram Research and MathWorks' MATLAB products. As such, we feel that the time is ripe for an advanced text on matplotlib that guides its more sophisticated users into new territory by not only allowing them to become experts in their own right, but also providing a clear path that will help them apply their new knowledge in a number of environments.
As a part of a master class series by Packt Publishing, this book focuses almost entirely on a select few of the most requested advanced topics in the world of matplotlib, which includes everything from matplotlib internals to high-performance computing environments. In order to best support this, we want to make sure that our readers have a chance to prepare for the material of this book, so we will start off gently.
The topics covered in this chapter include the following:
A brief historical overview of matplotlib
What's new in matplotlib
Who is an advanced, beginner, or an intermediate matplotlib user
The software dependencies for many of the book's examples
An overview of Python 3
An overview of the coding style used in this book
References for installation-related instructions
A refresher on IPython Notebooks
A teaser of a complicated plot in matplotlib
Additional resources to obtain advanced beginner and intermediate matplotlib knowledge
The open source project that we now know as matplotlib had its inception at the beginning of the millennium when John Hunter and his colleagues were conducting epilepsy research using proprietary data analysis software. They migrated to MATLAB as it was more flexible and less expensive. However, it was not designed to handle the data formats and diverse data sources that they had to contend with on a daily basis.
It was with this realization that John Hunter created the first version of matplotlib—a GTK+ visualization tool for electroencephalography and electrocorticography analysis. Having been built in Python, adding support for new features as the team needed them was a straightforward task. Before long, this led to the idea of providing a similar interactive command mode to generate plots on the fly, as MATLAB does.
One of the oldest sources available for matplotlib code online is the GitHub repository. The first commit in this repository was with regard to migration from Subversion to Git, though the original repository was CVS. This commit was authored in May 2003, though this repository records a CHANGELOG
file whose first entry was made in December 2002. By the time this book goes into publication, matplotlib will have celebrated its 13th birthday.
If you've read the preface, then you know who this book is for—developers with intermediate or advanced knowledge of matplotlib as well as the motivated beginners. But who are they exactly? What do such users know?
Answers to such questions are fairly open-ended. We have the following guidelines. The intermediate matplotlib user should have some limited knowledge to passing experience with the following:
Installation of matplotlib in multiple environments
Creation of basic to moderately complicated matplotlib plots
Basic matplotlib APIs, styling, backends, and customizations
Using matplotlib objects, subplots, and overlays
Advanced third-party tools such as Seaborn, Pandas, ggplot, distributed IPython, and StarCluster
Completed reading most or all of the following books, Matplotlib for Python Developers, Sandro Tosi, Packt Publishing, and matplotlib Plotting Cookbook, Alexandre Devert, Packt Publishing
This book assumes that you have previous experience with matplotlib and that it has been installed on your preferred development platform. If you need a refresher on the steps to accomplish that, the first chapter of Sandro Tosi's excellent book, Matplotlib for Python Developers, provides instructions to install matplotlib and its dependencies.
In addition to matplotlib, you will need a recent installation of IPython to run many of the examples and exercises provided. For help in getting started with IPython, there many great resources available on the project's site. Cyrille Rossant has authored Learning IPython for Interactive Computing and Data Visualization, Packt Publishing, which is a great resource as well.
In the course of this book, we will install, configure, and use additional open source libraries and frameworks. We will cover the setup of these as we get to them, but all the programs in this book will require you to have the following installed on your machine:
Git
GNU make
GNU Compiler Collection (gcc)
Your operating system's package manager should have a package that installs common developer tools—these tools should be installed as well, and may provide most of the tools automatically.
All the examples in this book will be implemented using a recent release of Python, version 3.4.2. Many of the examples will not work with the older versions of Python, so please note this carefully. In particular, the setup of virtual environments uses a feature that is new in Python 3.4.2, and some examples use the new type annotations. At the time of writing this book, the latest version of Ubuntu ships with Python 3.4.2.
Though matplotlib, NumPy, IPython, and the other libraries will be installed for you by set scripts provided in the code repositories for each chapter. For the sake of clarity, we will mention the versions used for some of these here:
matplotlib 1.4.3
NumPy 1.9.2
SciPy 0.15.1
IPython 3.1.0 (also known as Jupyter)
On this note, it's probably good to discuss Python 3 briefly as there has been continued debate on the choice between the two most recent versions of the programming language (the other being the 2.7.x series). Python 3 represents a massive community-wide effort to adopt better coding practices as well as improvements in the maintenance of long-lived libraries, frameworks, and applications. The primary impetus and on-going strength of this effort, though, is a general overhaul of the mechanisms underlying Python itself. This will ultimately allow the Python programming language greater maintainability and longevity in the coming years, not to mention better support for the ongoing performance enhancements.
In case you are new to Python 3, the following table, which compares some of the major syntactical differences between Python 2 and Python 3, has been provided:
Syntactical Differences |
Python 2 |
Python 3 |
---|---|---|
Division with floats |
|
|
Division with truncation |
|
|
Longs |
|
|
Not equal |
|
|
The unicode function |
|
|
Raw unicode |
|
|
Printing |
|
|
Raw user input |
|
|
User input |
|
|
Formatting |
|
|
Representation |
|
|
Function application |
|
|
Filter |
|
|
Map |
|
|
Zip |
|
|
Range |
|
|
Reduce |
|
|
Iteration |
|
|
|
| |
The execute file |
|
|
Exceptions |
|
|
The coding style used throughout this book and in the example code conforms to the standards laid out in PEP 8, with one exception. When entering code into an IPython Notebook or providing modules that will be displayed in the notebook, we will not use two lines to separate what would be module-level blocks of code. We will just use one line. This is done to save screen space.
Something that might strike you as different in our code is the use of an extraordinary feature of Python 3—function annotations. The work for this was done in PEP 3107 and was added in the first release of Python 3. The use of types and static analysis in programming, though new to Python, is a boon to the world of software. It saves time in development of a program by catching bugs before they even arise as well as streamlining unit tests. The benefit of this in our particular case, with regard to the examples in this book, is quick, intuitive code clarification. When you look at the functions, you will instantly know what is being passed and returned.
Finally, there is one best practice that we adhere to that is not widely adopted in the Python programming community—functions and methods are kept small in all of our code. If more than one logical thing is happening in a function, we break it into multiple functions and compose as needed. This keeps the code clean and clear, making examples much easier to read. It also makes it much easier to write unit tests without some of the excessive parameterization or awkward, large functions and methods that are often required in unit tests. We hope that this leaves a positive, long-lasting impression on you so that this practice receives wider adoption.
Given that this is a book on an advanced topic and the target audience will have installed matplotlib and the related dependencies more than once (most likely many times), detailed instructions will not be provided here. Two excellent books on matplotlib that cover this topic in their respective first chapters are Matplotlib for Python Developers and matplotlib Plotting Cookbook.
That being said, each chapter will have its own Git repository with scripts to install dependencies and set up Python's virtual environments. These scripts are a great resource, and reading them should provide additional details to those who seek to know more about installing matplotlib and the related libraries in Python virtual environments.
Python virtual environments are the recommended way of working with Python projects. They keep your system, Python, and default libraries safe from disruption. We will continue this tradition in this book, but you are welcome to transcend tradition and utilize the matplotlib library and the provided code in whatever way you see fit.
Using the native venv
Python environment management package, each project may define its own versions of dependent libraries, including those of matplotlib and IPython. The sample code for this book does just that—listing the dependencies in one or more requirements.txt
files.
With the addition of the nbagg
IPython Notebook backend to matplotlib in version 1.4, users can now work with plots in a browser very much like they've been able to do in the GTK and Qt apps on the desktop. We will take full advantage of this new feature.
In the IPython examples of this book, most of the notebooks will start off with the following:
In [1]: import matplotlib matplotlib.use('nbagg') In [2]: %matplotlib inline In [3]: import matplotlib.pyplot as plt
Tip
Downloading the example code
Each chapter in Mastering matplotlib provides instructions on obtaining the example code and notebook from Github. A master list has been provided at https://github.com/masteringmatplotlib/notebooks. You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you." This configures our notebooks to use matplotlib in the way that we need. The example in the following section starts off with just those commands.
A final note about IPython—the project has recently changed its name to Jupyter in an effort to embrace the language-agnostic growth the project and community has experienced as well as the architectural changes that will make the adding of new language backends much easier. The user experience will not change (except for the better), but you will notice a different name and logo when you open the chapter notebooks for this book.
To give a taste of what's to come, let's start up a matplotlib IPython Notebook and look at an example. You will need to download the example from a GitHub repository first:
$ git clone https://github.com/masteringmatplotlib/preview.git $ cd preview
You only need to do the following in order to bootstrap an environment with all the notebook dependencies and start up the notebook server:
$ make
This will do several things for you automatically, some of which are as follows:
Clone a support repository holding various
include
filesCreate a Python virtual environment
Install matplotlib and other scientific computing Python modules into this virtual environment
Start an IPython Notebook server that runs on local host
Open a browser window and load the
preview
notebook in it
In this browser window, you can run the code yourself by selecting each code section and hitting the Shift and Enter keys to execute it. Let's go through an example.
As mentioned above, our notebooks will all start with the following, as does this preview notebook:
In [1]: import matplotlib matplotlib.use('nbagg') %matplotlib inline In [2]: import matplotlib.pyplot as plt import seaborn as sns import numpy as np from scipy import stats import pandas as pd
These commands do the following:
Set up the interactive backend for plotting
Allow us to evaluate images in-line, as opposed doing the same in a pop-up window
Provide the standard alias to the
matplotlib.pyplot
sub package and import other packages that we will need
Our first preview example will take a look at the Seaborn package, an open source third-party library for data visualization and attractive statistical graphs. Seaborn depends upon not only matplotlib, but also NumPy and SciPy (among others). These were already installed for you when you ran make
(pulled from the requirements.txt
file).
We'll cover Seaborn palettes in more detail later in the book, so the following command is just a sample. Let's use a predefined palette with a moderate color saturation level:
In [3]: sns.set_palette("BuPu_d", desat=0.6) sns.set_context("notebook", font_scale=2.0)
Next, we'll generate two sets of random data (with a random seed of our choosing), one for the x axis and the other for the y axis. We're then going to plot the overlap of these distributions in a hex
plot. Here are the commands for the same:
In [4]: np.random.seed(42424242) In [5]: x = stats.gamma(5).rvs(420) y = stats.gamma(13).rvs(420) In [6]: with sns.axes_style("white"): sns.jointplot(x, y, kind="hex", size=16);
The generated graph is as follows:

In the second preview, we will use Pandas to graph a matrix of scatter plots whose diagonal will be the statistical graphs representing the kernel density estimation. We're going to go easy on the details for now; this is just to whet your appetite for more!
Pandas is a statistical data analysis library for Python that provides high-performance data structures, allowing one to carry out an entire scientific computing workflow in Python (as opposed to having to switch to something like R or Fortran for parts of it).
Let's take the seven columns (inclusive) from the baseball.csv
data file between Runs (r
) and Stolen Bases (sb
) for players between the years of 1871 and 2007 and look at them at the same time in one graph:
In [7]: baseball = pd.read_csv("../data/baseball.csv") In [8]: plt.style.use('../styles/custom.mplstyle') data = pd.scatter_matrix( baseball.loc[:,'r':'sb'], figsize=(16,10))
The generated graph is as follows:

Command 8 will take a few seconds longer than our previous plot since it's crunching a lot of data.
For now, the plot may look like something only a sabermetrician could read, but by the end of this book, complex graph matrices will be only one of many advanced topics in matplotlib that will have you reaching for new heights.
One last teaser before we close out the chapter—you may have noticed that the plots for the baseball data took a while to generate. Imagine doing 1,000 of these. Or 1,000,000. Traditionally, that's a showstopper for matplotlib projects, but in the latter half of this book, we will cover material that will not only show you how to overcome that limit, but also offer you several options to make it happen.
It's going to be a wild ride.
In this chapter, you got to learn a little more about matplotlib's origins and the latest features that were released at the time of writing this book. You've seen the software that we're going to use, including the version of the Python programming language that we've chosen. Furthermore, we've given you a peek into the future of this book (and matplotlib) with a custom IPython Notebook, which highlights the Seaborn and Pandas projects.
In the next couple of chapters, we're going to focus on matplotlib's internals. In particular, Chapter 2, The matplotlib Architecture will cover the architecture of the project, giving you an insight into how it all works together.