Chapter 9. Matplotlib in the Real World
At this point, we hope you are equipped with the techniques of creating and customizing plots using Matplotlib. Let's build on top of the things we have learned so far, and begin our journey of understanding more advanced Matplotlib usage through real-world examples.
First, we will cover how to fetch online data, which is commonly obtained through an application programming interface (API) or plain old web scraping techniques. Next, we will explore how to integrate Matplotlib 2.x with other scientific computing packages in Python for visualizations of different data types.
Many websites distribute data via their API, which bridges applications via standardized architecture. While we are not going to cover the details of using APIs here, we will cover the most common API data exchange formats, namely CSV and JSON.
Note
Interested readers can visit site-specific documentations for the use of APIs.
We have briefly covered parsing of CSV files in Chapter 4, Advanced Matplotlib. To aid your understanding, we are going to represent the same data using both CSV and JSON.
Comma-separated values (CSV) is one of the oldest file formats, introduced long before the World Wide Web even existed. However, it is now becoming deprecated as other advanced formats such as JSON and XML are gaining popularity. As the name suggests, data values are separated by commas. The preinstalled csv
package and the pandas
package contain classes to read and write data in CSV format. The following CSV example defines a population table with two countries:
Country,Time...
Importing and visualizing data from a JSON API
Now, let's learn how to parse financial data from Quandl's API to create insightful visualizations. Quandl is a financial and economic data warehouse, storing millions of datasets from hundreds of publishers. The best thing about Quandl is that these datasets are delivered via the unified API, without worrying about the procedures to parse the data correctly. Anonymous users can get up to 50 API calls per day, or up to 500 free API calls if registered. Readers can sign up for a free API key at https://www.quandl.com/?modal=register.
At Quandl, every dataset is identified by a unique ID, as defined by the Quandl code on each search result web page. For example, the Quandl code GOOG/NASDAQ_SWTX
defines the historical NASDAQ index data published by Google Finance. Every dataset is available in three different formats—CSV, JSON, and XML.
Although an official Python client library is available from Quandl, we are not going to use that, for the sake...
Scraping information from websites
Governments or jurisdictions around the world are increasingly embracing the importance of open data, which aims to increase citizen involvement and informed decision-making, and also aims to make policies more open to public scrutiny. Some examples of open data initiatives around the world include https://www.data.gov/ (United States of America), https://data.gov.uk/ (United Kingdom), and https://data.gov.hk/en/ (Hong Kong).
These data portals often provide an API for programmatic access of data. However, an API is not available for some datasets, hence we need to rely on good old web scraping techniques to extract information from websites.
Beautiful Soup (https://www.crummy.com/software/BeautifulSoup/) is an incredibly useful package for scraping information from websites. Basically, everything marked with an HTML tag can be scraped with this wonderful package. Scrapy is also a good package for web scraping, but it is more like a framework for writing...
Matplotlib graphical backends
The code for plotting graphs is considered the frontend in Matplotlib's terminology. We first mentioned backends in Chapter 1, Introduction to Matplotlib, when we were talking about output formats. In reality, Matplotlib backends have much more differences than just support for graphical formats. Backends handle so many things behind the scenes! And that would determine the support of plotting capabilities. For example, the LaTeX text layout is supported only by Agg, PDF, PGF, and PS backends.
We have been using several non-interactive backends so far, which include Agg, Cairo, GDK, PDF, PGF, PS, and SVG. Most of these backends work without extra dependencies, but Cairo and GDK require the Cairo graphics library or GIMP Drawing Kit, respectively, to work.
Non-interactive backends can be further classified into two groups—vector or raster. Vector graphics describe images in terms of points, paths, and shapes that are calculated using mathematical...
Matplotlib was not designed as an animation package from the get-go, and thus it would appear sluggish in some advanced usages. For animation-centric applications, PyGame is a very good alternative (https://www.pygame.org) which supports OpenGL- and Direct3D-accelerated graphics for the ultimate speed in animating objects. Nevertheless, Matplotlib has acceptable performance most of the time, and we will guide you through the steps to create animations that are more engaging than static plots.
Before we start making animations, we need to install either FFmpeg, avconv, mencoder, or ImageMagick on our system. These additional dependencies are not bundled with Matplotlib, and thus we need to install them separately. We are going to walk you through the steps of installing FFmpeg.
For Debian-based Linux users, FFmpeg can be installed by simply issuing the following command in Terminal.
sudo apt-get install ffmpeg
For Mac users, Homebrew (https://brew.sh/) is the simplest way...
In this chapter, you learned how to parse online data in CSV or JSON formats using the versatile pandas package. You further learned how to filter, subset, merge, and process data into insights. Finally, you learned how to scrape information directly from websites. You have now equipped yourself with the knowledge to visualize time series, univariate, and bivariate data. The chapter concluded with a number of useful techniques to customize figure aesthetics for effective storytelling.
Phew! We have just completed a long chapter, so go grab a burger, have a break, and relax.