Reader small image

You're reading from  Hands-On Data Analysis with Pandas - Second Edition

Product typeBook
Published inApr 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800563452
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Stefanie Molin
Stefanie Molin
author image
Stefanie Molin

Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Read more about Stefanie Molin

Right arrow

Chapter 6: Plotting with Seaborn and Customization Techniques

In the previous chapter, we learned how to create many different visualizations using matplotlib and pandas on wide-format data. In this chapter, we will see how we can make visualizations from long-format data, using seaborn, and how to customize our plots to improve their interpretability. Remember that the human brain excels at finding patterns in visual representations; by making clear and meaningful data visualizations, we can help others (not to mention ourselves) understand what the data is trying to say.

Seaborn is capable of making many of the same plots we created in the previous chapter; however, it also makes quick work of long-format data, allowing us to use subsets of our data to encode additional information into our visualizations, such as facets and/or colors for different categories. We will walk through some implementations of what we did in the previous chapter that are easier (or just more aesthetically...

Chapter materials

The materials for this chapter can be found on GitHub at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition/tree/master/ch_06. We will be working with three datasets once again, all of which can be found in the data/ directory. In the fb_stock_prices_2018.csv file, we have Facebook's stock price for all trading days in 2018. This data is the OHLC data (opening, high, low, and closing price), along with the volume traded. It was gathered using the stock_analysis package, which we will build in Chapter 7, Financial Analysis – Bitcoin and the Stock Market. The stock market is closed on the weekends, so we only have data for the trading days.

The earthquakes.csv file contains earthquake data pulled from the United States Geological Survey (USGS) API (https://earthquake.usgs.gov/fdsnws/event/1/) for September 18, 2018, through October 13, 2018. For each earthquake, we have the magnitude (the mag column), the scale it was measured...

Utilizing seaborn for advanced plotting

As we saw in the previous chapter, pandas provides implementations for most visualizations we would want to create; however, there is another library, seaborn, that provides additional functionality for more involved visualizations and makes creating visualizations with long-format data much easier than pandas. These also tend to look much nicer than standard visualizations generated by matplotlib.

For this section, we will be working with the 1-introduction_to_seaborn.ipynb notebook. First, we must import seaborn, which is traditionally aliased as sns:

>>> import seaborn as sns

Let's also import numpy, matplotlib.pyplot, and pandas, and then read in the CSV files for the Facebook stock prices and earthquake data:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> import pandas as pd 
>>> fb = pd.read_csv(
...     ...

Formatting plots with matplotlib

A big part of making our visualizations presentable is choosing the right plot type and having them well labeled so they are easy to interpret. By carefully tuning the final appearance of our visualizations, we make them easier to read and understand.

Let's now move to the 2-formatting_plots.ipynb notebook, run the setup code to import the packages we need, and read in the Facebook stock data and COVID-19 daily new cases data:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> import pandas as pd 
>>> fb = pd.read_csv(
...     'data/fb_stock_prices_2018.csv', 
...     index_col='date', 
...     parse_dates=True
... ) 
>>> covid = pd.read_csv('data/covid19_cases.csv').assign(
...     date=lambda x: \
...   &...

Customizing visualizations

So far, all of the code we've learned for creating data visualizations has been for making the visualization itself. Now that we have a strong foundation, we are ready to learn how to add reference lines, control colors and textures, and include annotations.

In the 3-customizing_visualizations.ipynb notebook, let's handle our imports and read in the Facebook stock prices and earthquake datasets:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> fb = pd.read_csv(
...     'data/fb_stock_prices_2018.csv', 
...     index_col='date', 
...     parse_dates=True
... )
>>> quakes = pd.read_csv('data/earthquakes.csv')

Tip

Changing the style in which the plots are created is an easy way to change their look and feel without setting each aspect separately. To set...

Summary

Whew, that was a lot! We learned how to create impressive and customized visualizations using matplotlib, pandas, and seaborn. We discussed how we can use seaborn for additional plotting types and cleaner versions of some familiar ones. Now we can easily make our own colormaps, annotate our plots, add reference lines and shaded regions, finesse the axes/legends/titles, and control most aspects of how our visualizations will appear. We also got a taste of working with itertools and creating our own generators.

Take some time to practice what we've discussed with the end-of-chapter exercises. In the next chapter, we will apply all that we have learned to finance, as we build our own Python package and compare bitcoin to the stock market.

Exercises

Create the following visualizations using what we have learned so far in this book and the data from this chapter. Be sure to add titles, axis labels, and legends (where appropriate) to the plots:

  1. Using seaborn, create a heatmap to visualize the correlation coefficients between earthquake magnitude and whether there was a tsunami for earthquakes measured with the mb magnitude type.
  2. Create a box plot of Facebook volume traded and closing prices, and draw reference lines for the bounds of a Tukey fence with a multiplier of 1.5. The bounds will be at Q1 − 1.5 × IQR and Q3 + 1.5 × IQR. Be sure to use the quantile() method on the data to make this easier. (Pick whichever orientation you prefer for the plot, but make sure to use subplots.)
  3. Plot the evolution of cumulative COVID-19 cases worldwide, and add a dashed vertical line on the date that it surpassed 1 million. Be sure to format the tick labels on the y-axis accordingly.
  4. Use axvspan(...

Further reading

Check out the following resources for more information on the topics covered in this chapter:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Data Analysis with Pandas - Second Edition
Published in: Apr 2021Publisher: PacktISBN-13: 9781800563452
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Stefanie Molin

Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Read more about Stefanie Molin