Reader small image

You're reading from  Matplotlib 2.x By Example

Product typeBook
Published inAug 2017
PublisherPackt
ISBN-139781788295260
Edition1st Edition
Right arrow
Authors (3):
Allen Yu
Allen Yu
author image
Allen Yu

Allen Yu, PhD, is a Chevening Scholar, 2017-18, and an MSC student in computer science at the University of Oxford. He holds a PhD degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of bioinformatics experience.
Read more about Allen Yu

Claire Chung
Claire Chung
author image
Claire Chung

Claire Chung is pursuing her PhD degree as a Bioinformatician at the Chinese University of Hong Kong. She enjoys using Python daily for work and lifehack. While passionate in science, her challenge-loving character motivates her to go beyond data analytics. She has participated in web development projects, as well as developed skills in graphic design and multilingual translation. She led the Campus Network Support Team in college, and shared her experience in data visualization in PyCon HK 2017.
Read more about Claire Chung

Aldrin Yim
Aldrin Yim
author image
Aldrin Yim

Aldrin Yim is a PhD candidate and Markey Scholar in the Computation and System Biology program at Washington University, School of Medicine. His research focuses on applying big data analytics and machine learning approaches in studying neurological diseases and cancer. He is also the founding CEO of Codex Genetics Limited, which provides precision medicine solutions to patients and hospitals in Asia.
Read more about Aldrin Yim

View More author details
Right arrow

Chapter 5. Visualizing Multivariate Data

When we have big data that contains many variables, the plot types in Chapter 4, Visualizing Online Data may no longer be an effective way of data visualization. We may try to cramp as many variables in a single plot as possible, but the overcrowded or cluttered details would quickly reach the boundary of a human's visual perception capabilities.

In this chapter, we aim to introduce multivariate data visualization techniques; they enable us to better understand the distribution of data and the relationships between variables. Here is the outline of this chapter:

  • Getting End-of-Day (EOD) stock data from Quandl
  • Two-dimensional faceted plots:
    • Factor plot in Seaborn
    • Faceted grid in Seaborn
    • Pair plot in Seaborn
  • Other two-dimensional multivariate plots:
    • Heatmap in Seaborn
    • Candlestick plot in matplotlib.finance:
      • Visualizing various stock market indicators
    • Building a comprehensive stock chart
  • Three-dimensional plots:
    • Scatter plot
    • Bar chart
    • Caveats of using Matplotlib...

Getting End-of-Day (EOD) stock data from Quandl


Since we are going to discuss stock data extensively, note that we do not guarantee the accuracy, completeness, or validity of the content presented; nor are we responsible for any errors or omissions that may have occurred. The data, visualizations, and analyses are provided on an “as is” basis for educational purposes only, without any representations, warranties, or conditions of any kind. Therefore, the publisher and the authors do not accept liability for your use of the content. It should be noted that past stock performance may not predict future performance. Readers should also be aware of the risks involved in stock investments and should not take any investment decisions based on the content in this chapter. In addition, readers are advised to conduct their own independent research into individual stocks before making a investment decision.

We are going to adapt the Quandl JSON API code in Chapter 4Visualizing Online Data to get...

Two-dimensional faceted plots


We are going to introduce three major ways to create faceted plots: seaborn.factorplot(), seaborn.FacetGrid(), and seaborn.pairplot(). You might have seen some faceted plots in the previous chapter, when we talked about seaborn.lmplot(). Actually, the seaborn.lmplot() function combines seaborn.regplot() with seaborn.FacetGrid(), and the definitions of data subsets can be adjusted by the hue, col, and row parameters.

We are going to introduce three major ways to create faceted plots: seaborn.factorplot(), seaborn.FacetGrid(), and seaborn.pairplot(). These functions actually work similarly to seaborn.lmplot() in the way of defining facets.

Factor plot in Seaborn

With the help of seaborn.factorplot(), we can draw categorical point plots, box plots, violin plots, bar plots, or strip plots onto a seaborn.FacetGrid() by tuning the kind parameter. The default plot type for factorplot is point plot. Unlike other plotting functions in Seaborn, which support a wide variety...

Other two-dimensional multivariate plots


FacetGrid, factor plot, and pair plot may take up a lot of space when we need to visualize more variables or samples. There are two special plot types that come in handy if you want the maximize space efficiency--Heatmaps and Candlestick plots.

Heatmap in Seaborn

A heatmap is an extremely compact way to display a large amount of data. In the finance world, color-coded blocks can give investors a quick glance at which stocks are up or down. In the scientific world, heatmaps allow researchers to visualize the expression level of thousands of genes.

The seaborn.heatmap() function expects a 2D list, 2D Numpy array, or pandas DataFrame as input. If a list or array is supplied, we can supply column and row labels via xticklabels and yticklabels respectively. On the other hand, if a DataFrame is supplied, the column labels and index values will be used to label the columns and rows respectively.

To get started, we will plot an overview of the performance of...

Three-dimensional (3D) plots


By transitioning to the three-dimensional space, you may enjoy greater creative freedom when creating visualizations. The extra dimension can also accommodate more information in a single plot. However, some may argue that 3D is nothing more than a visual gimmick when projected to a 2D surface (such as paper) as it would obfuscate the interpretation of data points.

In Matplotlib version 2, despite significant developments in the 3D API, annoying bugs or glitches still exist. We will discuss some workarounds toward the end of this chapter. More powerful Python 3D visualization packages do exist (such as MayaVi2, Plotly, and VisPy), but it's good to use Matplotlib's 3D plotting functions if you want to use the same package for both 2D and 3D plots, or you would like to maintain the aesthetics of its 2D plots.

For the most part, 3D plots in Matplotlib have similar structures to 2D plots. As such, we will not go through every 3D plot type in this section. We will put...

Summary


You have successfully learned the techniques for visualizing multivariate data in 2D and 3D forms. Although most examples in this chapter revolved around the topic of stock trading, the data processing and visualization methods can be applied readily to other fields as well. In particular, the divide-and-conquer approach used to visualize multivariate data in facets is extremely useful in the scientific field. 

We didn't go into too much detail of the 3D plotting capability of Matplotlib, as it is yet to be polished. For simple 3D plots, Matplotlib already suffices. The learning curve can be reduced if we use the same package for both 2D and 3D plots. You are advised to take a look at MayaVi2, Plotly, and VisPy if you require more powerful 3D plotting functions.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Matplotlib 2.x By Example
Published in: Aug 2017Publisher: PacktISBN-13: 9781788295260
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Allen Yu

Allen Yu, PhD, is a Chevening Scholar, 2017-18, and an MSC student in computer science at the University of Oxford. He holds a PhD degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of bioinformatics experience.
Read more about Allen Yu

author image
Claire Chung

Claire Chung is pursuing her PhD degree as a Bioinformatician at the Chinese University of Hong Kong. She enjoys using Python daily for work and lifehack. While passionate in science, her challenge-loving character motivates her to go beyond data analytics. She has participated in web development projects, as well as developed skills in graphic design and multilingual translation. She led the Campus Network Support Team in college, and shared her experience in data visualization in PyCon HK 2017.
Read more about Claire Chung

author image
Aldrin Yim

Aldrin Yim is a PhD candidate and Markey Scholar in the Computation and System Biology program at Washington University, School of Medicine. His research focuses on applying big data analytics and machine learning approaches in studying neurological diseases and cancer. He is also the founding CEO of Codex Genetics Limited, which provides precision medicine solutions to patients and hospitals in Asia.
Read more about Aldrin Yim