Mastering matplotlib

4 (6 reviews total)
By Duncan M. McGreggor
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Getting Up to Speed

About this book

matplotlib is a Python plotting library that provides a large feature set for a multitude of platforms. Given the depth of the library's legacy and the variety of related open source projects, gaining expert knowledge can be a time-consuming and often confusing process.

You'll begin your exciting journey learning about the skills that are necessary in leading technical teams for a visualization project or to become a matplotlib contributor.

Supported by highly-detailed IPython Notebooks, this book takes you through the conceptual components underlying the library and then provides a detailed overview of its APIs. From there, you will learn about event handling and how to code for interactive plots.

Next you will move on to customization techniques, local configuration of matplotib, and then deployments in Cloud environments. The adventure culminates in an exploration of big data visualization and matplotlib clustering.

Publication date:
June 2015
Publisher
Packt
Pages
292
ISBN
9781783987542

 

Chapter 1. Getting Up to Speed

Over the past 12 years of its existence, matplotlib has made its way into the classrooms, labs, and hearts of the scientific computing world. With Python's rise in popularity for serious professional and academic work, matplotlib has taken a respected seat beside long-standing giants such as Mathematica by Wolfram Research and MathWorks' MATLAB products. As such, we feel that the time is ripe for an advanced text on matplotlib that guides its more sophisticated users into new territory by not only allowing them to become experts in their own right, but also providing a clear path that will help them apply their new knowledge in a number of environments.

As a part of a master class series by Packt Publishing, this book focuses almost entirely on a select few of the most requested advanced topics in the world of matplotlib, which includes everything from matplotlib internals to high-performance computing environments. In order to best support this, we want to make sure that our readers have a chance to prepare for the material of this book, so we will start off gently.

The topics covered in this chapter include the following:

  • A brief historical overview of matplotlib

  • What's new in matplotlib

  • Who is an advanced, beginner, or an intermediate matplotlib user

  • The software dependencies for many of the book's examples

  • An overview of Python 3

  • An overview of the coding style used in this book

  • References for installation-related instructions

  • A refresher on IPython Notebooks

  • A teaser of a complicated plot in matplotlib

  • Additional resources to obtain advanced beginner and intermediate matplotlib knowledge

 

A brief historical overview of matplotlib


The open source project that we now know as matplotlib had its inception at the beginning of the millennium when John Hunter and his colleagues were conducting epilepsy research using proprietary data analysis software. They migrated to MATLAB as it was more flexible and less expensive. However, it was not designed to handle the data formats and diverse data sources that they had to contend with on a daily basis.

It was with this realization that John Hunter created the first version of matplotlib—a GTK+ visualization tool for electroencephalography and electrocorticography analysis. Having been built in Python, adding support for new features as the team needed them was a straightforward task. Before long, this led to the idea of providing a similar interactive command mode to generate plots on the fly, as MATLAB does.

One of the oldest sources available for matplotlib code online is the GitHub repository. The first commit in this repository was with regard to migration from Subversion to Git, though the original repository was CVS. This commit was authored in May 2003, though this repository records a CHANGELOG file whose first entry was made in December 2002. By the time this book goes into publication, matplotlib will have celebrated its 13th birthday.

 

What's new in matplotlib 1.4


In the past 12 years, a great deal has happened in the matplotlib codebase. Of particular interest are the new features that have been added to the most recent release at the time of writing this book—version 1.4.3. Here are some of its highlights:

  • A new IPython Notebook backend for interactive matplotlib plot support

  • A new style package that allows for greater control over the visual presentation of plots

  • The new Qt5 backend

  • Google App Engine integration

  • New plotting features

  • New configuration options

 

The intermediate matplotlib user


If you've read the preface, then you know who this book is for—developers with intermediate or advanced knowledge of matplotlib as well as the motivated beginners. But who are they exactly? What do such users know?

Answers to such questions are fairly open-ended. We have the following guidelines. The intermediate matplotlib user should have some limited knowledge to passing experience with the following:

  • Installation of matplotlib in multiple environments

  • Creation of basic to moderately complicated matplotlib plots

  • Basic matplotlib APIs, styling, backends, and customizations

  • Using matplotlib objects, subplots, and overlays

  • Advanced third-party tools such as Seaborn, Pandas, ggplot, distributed IPython, and StarCluster

  • Completed reading most or all of the following books, Matplotlib for Python Developers, Sandro Tosi, Packt Publishing, and matplotlib Plotting Cookbook, Alexandre Devert, Packt Publishing

 

Prerequisites for this book


This book assumes that you have previous experience with matplotlib and that it has been installed on your preferred development platform. If you need a refresher on the steps to accomplish that, the first chapter of Sandro Tosi's excellent book, Matplotlib for Python Developers, provides instructions to install matplotlib and its dependencies.

In addition to matplotlib, you will need a recent installation of IPython to run many of the examples and exercises provided. For help in getting started with IPython, there many great resources available on the project's site. Cyrille Rossant has authored Learning IPython for Interactive Computing and Data Visualization, Packt Publishing, which is a great resource as well.

In the course of this book, we will install, configure, and use additional open source libraries and frameworks. We will cover the setup of these as we get to them, but all the programs in this book will require you to have the following installed on your machine:

  • Git

  • GNU make

  • GNU Compiler Collection (gcc)

Your operating system's package manager should have a package that installs common developer tools—these tools should be installed as well, and may provide most of the tools automatically.

All the examples in this book will be implemented using a recent release of Python, version 3.4.2. Many of the examples will not work with the older versions of Python, so please note this carefully. In particular, the setup of virtual environments uses a feature that is new in Python 3.4.2, and some examples use the new type annotations. At the time of writing this book, the latest version of Ubuntu ships with Python 3.4.2.

Though matplotlib, NumPy, IPython, and the other libraries will be installed for you by set scripts provided in the code repositories for each chapter. For the sake of clarity, we will mention the versions used for some of these here:

  • matplotlib 1.4.3

  • NumPy 1.9.2

  • SciPy 0.15.1

  • IPython 3.1.0 (also known as Jupyter)

 

Python 3


On this note, it's probably good to discuss Python 3 briefly as there has been continued debate on the choice between the two most recent versions of the programming language (the other being the 2.7.x series). Python 3 represents a massive community-wide effort to adopt better coding practices as well as improvements in the maintenance of long-lived libraries, frameworks, and applications. The primary impetus and on-going strength of this effort, though, is a general overhaul of the mechanisms underlying Python itself. This will ultimately allow the Python programming language greater maintainability and longevity in the coming years, not to mention better support for the ongoing performance enhancements.

In case you are new to Python 3, the following table, which compares some of the major syntactical differences between Python 2 and Python 3, has been provided:

Syntactical Differences

Python 2

Python 3

Division with floats

x = 15 / 3.0

x = 15 / 3

Division with truncation

x = 15 / 4

x = 15 // 4

Longs

y = long(x * 10)

y = int(x * 10)

Not equal

x <> y

x != y

The unicode function

u = unicode(s)

u = str(s)

Raw unicode

u = ur"\t\s"

u = r"\t\s"

Printing

print x, y, z

print(x, y, z)

Raw user input

y = raw_input(x)

y = input(x)

User input

y = input(x)

y = eval(input(x))

Formatting

"%d %s" % (n, s)

"{} {}".format(n,s)

Representation

'x'

repr(x)

Function application

apply(fn, args)

fn(*args)

Filter

itertools.ifilter

filter

Map

itertools.imap

map

Zip

itertools.izip

zip

Range

xrange

range

Reduce

reduce

functools.reduce

Iteration

iterator.next()

next(iterator)

The execute code

exec code

exec(code)

The execute file

execfile(file)

exec(fh.read())

Exceptions

try:

...

except val, err:

...

try:

...

except val as err:

...

 

Coding style


The coding style used throughout this book and in the example code conforms to the standards laid out in PEP 8, with one exception. When entering code into an IPython Notebook or providing modules that will be displayed in the notebook, we will not use two lines to separate what would be module-level blocks of code. We will just use one line. This is done to save screen space.

Something that might strike you as different in our code is the use of an extraordinary feature of Python 3—function annotations. The work for this was done in PEP 3107 and was added in the first release of Python 3. The use of types and static analysis in programming, though new to Python, is a boon to the world of software. It saves time in development of a program by catching bugs before they even arise as well as streamlining unit tests. The benefit of this in our particular case, with regard to the examples in this book, is quick, intuitive code clarification. When you look at the functions, you will instantly know what is being passed and returned.

Finally, there is one best practice that we adhere to that is not widely adopted in the Python programming community—functions and methods are kept small in all of our code. If more than one logical thing is happening in a function, we break it into multiple functions and compose as needed. This keeps the code clean and clear, making examples much easier to read. It also makes it much easier to write unit tests without some of the excessive parameterization or awkward, large functions and methods that are often required in unit tests. We hope that this leaves a positive, long-lasting impression on you so that this practice receives wider adoption.

 

Installing matplotlib


Given that this is a book on an advanced topic and the target audience will have installed matplotlib and the related dependencies more than once (most likely many times), detailed instructions will not be provided here. Two excellent books on matplotlib that cover this topic in their respective first chapters are Matplotlib for Python Developers and matplotlib Plotting Cookbook.

That being said, each chapter will have its own Git repository with scripts to install dependencies and set up Python's virtual environments. These scripts are a great resource, and reading them should provide additional details to those who seek to know more about installing matplotlib and the related libraries in Python virtual environments.

 

Using IPython Notebooks with matplotlib


Python virtual environments are the recommended way of working with Python projects. They keep your system, Python, and default libraries safe from disruption. We will continue this tradition in this book, but you are welcome to transcend tradition and utilize the matplotlib library and the provided code in whatever way you see fit.

Using the native venv Python environment management package, each project may define its own versions of dependent libraries, including those of matplotlib and IPython. The sample code for this book does just that—listing the dependencies in one or more requirements.txt files.

With the addition of the nbagg IPython Notebook backend to matplotlib in version 1.4, users can now work with plots in a browser very much like they've been able to do in the GTK and Qt apps on the desktop. We will take full advantage of this new feature.

In the IPython examples of this book, most of the notebooks will start off with the following:

In [1]: import matplotlib matplotlib.use('nbagg')
In [2]: %matplotlib inline
In [3]: import matplotlib.pyplot as plt

Tip

Downloading the example code

Each chapter in Mastering matplotlib provides instructions on obtaining the example code and notebook from Github. A master list has been provided at https://github.com/masteringmatplotlib/notebooks. You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you." This configures our notebooks to use matplotlib in the way that we need. The example in the following section starts off with just those commands.

A final note about IPython—the project has recently changed its name to Jupyter in an effort to embrace the language-agnostic growth the project and community has experienced as well as the architectural changes that will make the adding of new language backends much easier. The user experience will not change (except for the better), but you will notice a different name and logo when you open the chapter notebooks for this book.

 

Advanced plots – a preview


To give a taste of what's to come, let's start up a matplotlib IPython Notebook and look at an example. You will need to download the example from a GitHub repository first:

$ git clone https://github.com/masteringmatplotlib/preview.git
$ cd preview

You only need to do the following in order to bootstrap an environment with all the notebook dependencies and start up the notebook server:

$ make

This will do several things for you automatically, some of which are as follows:

  • Clone a support repository holding various include files

  • Create a Python virtual environment

  • Install matplotlib and other scientific computing Python modules into this virtual environment

  • Start an IPython Notebook server that runs on local host

  • Open a browser window and load the preview notebook in it

In this browser window, you can run the code yourself by selecting each code section and hitting the Shift and Enter keys to execute it. Let's go through an example.

 

Setting up the interactive backend


As mentioned above, our notebooks will all start with the following, as does this preview notebook:

In [1]: import matplotlib
        matplotlib.use('nbagg')
        %matplotlib inline
In [2]: import matplotlib.pyplot as plt
        import seaborn as sns
        import numpy as np
        from scipy import stats
        import pandas as pd

These commands do the following:

  • Set up the interactive backend for plotting

  • Allow us to evaluate images in-line, as opposed doing the same in a pop-up window

  • Provide the standard alias to the matplotlib.pyplot sub package and import other packages that we will need

Joint plots with Seaborn

Our first preview example will take a look at the Seaborn package, an open source third-party library for data visualization and attractive statistical graphs. Seaborn depends upon not only matplotlib, but also NumPy and SciPy (among others). These were already installed for you when you ran make (pulled from the requirements.txt file).

We'll cover Seaborn palettes in more detail later in the book, so the following command is just a sample. Let's use a predefined palette with a moderate color saturation level:

In [3]: sns.set_palette("BuPu_d", desat=0.6)
        sns.set_context("notebook", font_scale=2.0)

Next, we'll generate two sets of random data (with a random seed of our choosing), one for the x axis and the other for the y axis. We're then going to plot the overlap of these distributions in a hex plot. Here are the commands for the same:

In [4]: np.random.seed(42424242)
In [5]: x = stats.gamma(5).rvs(420)
        y = stats.gamma(13).rvs(420)
In [6]: with sns.axes_style("white"):
            sns.jointplot(x, y, kind="hex", size=16);

The generated graph is as follows:

Scatter plot matrix graphs with Pandas

In the second preview, we will use Pandas to graph a matrix of scatter plots whose diagonal will be the statistical graphs representing the kernel density estimation. We're going to go easy on the details for now; this is just to whet your appetite for more!

Pandas is a statistical data analysis library for Python that provides high-performance data structures, allowing one to carry out an entire scientific computing workflow in Python (as opposed to having to switch to something like R or Fortran for parts of it).

Let's take the seven columns (inclusive) from the baseball.csv data file between Runs (r) and Stolen Bases (sb) for players between the years of 1871 and 2007 and look at them at the same time in one graph:

In [7]: baseball = pd.read_csv("../data/baseball.csv")
In [8]: plt.style.use('../styles/custom.mplstyle')
        data = pd.scatter_matrix(
             baseball.loc[:,'r':'sb'],
             figsize=(16,10))

The generated graph is as follows:

Command 8 will take a few seconds longer than our previous plot since it's crunching a lot of data.

For now, the plot may look like something only a sabermetrician could read, but by the end of this book, complex graph matrices will be only one of many advanced topics in matplotlib that will have you reaching for new heights.

One last teaser before we close out the chapter—you may have noticed that the plots for the baseball data took a while to generate. Imagine doing 1,000 of these. Or 1,000,000. Traditionally, that's a showstopper for matplotlib projects, but in the latter half of this book, we will cover material that will not only show you how to overcome that limit, but also offer you several options to make it happen.

It's going to be a wild ride.

 

Summary


In this chapter, you got to learn a little more about matplotlib's origins and the latest features that were released at the time of writing this book. You've seen the software that we're going to use, including the version of the Python programming language that we've chosen. Furthermore, we've given you a peek into the future of this book (and matplotlib) with a custom IPython Notebook, which highlights the Seaborn and Pandas projects.

In the next couple of chapters, we're going to focus on matplotlib's internals. In particular, Chapter 2, The matplotlib Architecture will cover the architecture of the project, giving you an insight into how it all works together.

About the Author

  • Duncan M. McGreggor

    Duncan M. McGreggor, having programmed with GOTOs in the 1980s, has made up for that through community service by making open source contributions for more than 20 years. He has spent a major part of the past 10 years dealing with distributed and scientific computing (in languages ranging from Python, Common Lisp, and Julia to Clojure and Lisp Flavored Erlang). In the 1990s, after serving as a linguist in the US Army, he spent considerable time working on projects related to MATLAB and Mathematica, which was a part of his physics and maths studies at the university. Since the mid 2000s, matplotlib and NumPy have figured prominently in many of the interesting problems that he has solved for his customers. With the most recent addition of the IPython Notebook, matplotlib and the suite of the Python scientific computing libraries remain some of his most important professional tools.

    Browse publications by this author

Latest Reviews

(6 reviews total)
Wherever the previous book on matplotlib was not enough, this came in handy.
enough hjgjkhgljg 176r237812364
I think the book is either too short or covers too many things as it doesn't go into enough detail on some interesting topics.
Mastering matplotlib
Unlock this book and the full library for FREE
Start free trial