Reader small image

You're reading from  The Data Visualization Workshop

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800568846
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Mario Döbler
Mario Döbler
author image
Mario Döbler

Mario Döbler is a Ph.D. student with a focus on deep learning at the University of Stuttgart. He previously interned at the Bosch Center for artificial intelligence in the Silicon Valley in the field of deep learning. He used state-of-the-art algorithms to develop cutting-edge products. In his master thesis, he dedicated himself to applying deep learning to medical data to drive medical applications.
Read more about Mario Döbler

Tim Großmann
Tim Großmann
author image
Tim Großmann

Tim Großmann is a computer scientist with interest in diverse topics, ranging from AI and IoT to Security. He previously worked in the field of big data engineering at the Bosch Center for Artificial Intelligence in Silicon Valley. In addition to that, he worked on an Eclipse project for IoT device abstractions in Singapore. He's highly involved in several open-source projects and actively speaks at tech meetups and conferences about his projects and experiences.
Read more about Tim Großmann

View More author details
Right arrow

4. Simplifying Visualizations Using Seaborn

Overview

In this chapter, we will see how Seaborn differs from Matplotlib and construct effective plots leveraging the advantages of Seaborn. Specifically, you will use Seaborn to plot bivariate distributions, heatmaps, pairwise relationships, and so on. This chapter also teaches you how to use FacetGrid for visualizing plots for multiple variables separately. By the end of this chapter, you will be able to explain the advantages Seaborn has compared to Matplotlib and design visually appealing and insightful plots efficiently.

Introduction

In the previous chapter, we took an in-depth look at Matplotlib, one of the most popular plotting libraries for Python. Various plot types were covered, and we looked into customizing plots to create aesthetic plots.

Unlike Matplotlib, Seaborn is not a standalone Python library. It is built on top of Matplotlib and provides a higher-level abstraction to make visually appealing statistical visualizations. A neat feature of Seaborn is the ability to integrate with DataFrames from the pandas library.

With Seaborn, we attempt to make visualization a central part of data exploration and understanding. Internally, Seaborn operates on DataFrames and arrays that contain the complete dataset. This enables it to perform semantic mappings and statistical aggregations that are essential for displaying informative visualizations. Seaborn can also be used to simply change the style and appearance of Matplotlib visualizations.

The most prominent features of Seaborn are as follows...

Controlling Figure Aesthetics

As we mentioned previously, Matplotlib is highly customizable. But it also has the effect that it is very inconvenient, as it can take a long time to adjust all necessary parameters to get your desired visualization. In Seaborn, we can use customized themes and a high-level interface for controlling the appearance of Matplotlib figures.

The following code snippet creates a simple line plot in Matplotlib:

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

This is what the plot looks with Matplotlib's default parameters:

Figure 4.2: Matplotlib line plot

To switch to the Seaborn defaults, simply call the set() function:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43,...

Color Palettes

Color is a very important factor for your visualization. Color can reveal patterns in data if used effectively or hide patterns if used poorly. Seaborn makes it easy to select and use color palettes that are suited to your task. The color_palette() function provides an interface for many of the possible ways to generate color palettes.

The seaborn.color_palette([palette], [n_colors], [desat]) command returns a list of colors, thus defining a color palette.

The parameters are as follows:

  • palette (optional): Name of palette or None to return the current palette.
  • n_colors (optional): Number of colors in the palette. If the specified number of colors is larger than the number of colors in the palette, the colors will be cycled.
  • desat (optional): The proportion to desaturate each color by.

You can set the palette for all plots with set_palette(). This function accepts the same arguments as color_palette(). In the following sections, we will explain...

Advanced Plots in Seaborn

In the previous chapter, we discussed various plots in Matplotlib, but there are still a few visualizations left that we want to discuss. First, we will revise bar plots since Seaborn offers some neat additional features for them. Moreover, we will cover kernel density estimation, correlograms, and violin plots.

Bar Plots

In the last chapter, we already explained how to create bar plots with Matplotlib. Creating bar plots with subgroups was quite tedious, but Seaborn offers a very convenient way to create various bar plots. They can also be used in Seaborn to represent estimates of central tendency with the height of each bar, while uncertainty is indicated by error bars at the top of the bar.

The following example gives you a good idea of how this works:

import pandas as pd
import seaborn as sns
data = pd.read_csv("../Datasets/salary.csv")
sns.set(style="whitegrid")
sns.barplot(x="Education", y="Salary"...

Multi-Plots in Seaborn

In the previous topic, we introduced a multi-plot, namely, the pair plot. In this topic, we want to talk about a different way to create flexible multi-plots.

FacetGrid

The FacetGrid is useful for visualizing a certain plot for multiple variables separately. A FacetGrid can be drawn with up to three dimensions: row, col, and hue. The first two have the obvious relationship with the rows and columns of an array. The hue is the third dimension and is shown in different colors. The FacetGrid class has to be initialized with a DataFrame, and the names of the variables that will form the row, column, or hue dimensions of the grid. These variables should be categorical or discrete.

The seaborn.FacetGrid(data, row, col, hue, …) command initializes a multi-plot grid for plotting conditional relationships.

Here are some interesting parameters:

  • data: A tidy ("long-form") DataFrame where each column corresponds to a variable, and each...

Regression Plots

Regression is a technique in which we estimate the relationship between a dependent variable (mostly plotted along the Y – axis) and an independent variable (mostly plotted along the X – axis). Given a dataset, we can assign independent and dependent variables and then use various regression methods to find out the relation between these variables. Here, we will only cover linear regression; however, Seaborn provides a wider range of regression functionality if needed.

The regplot() function offered by Seaborn helps to visualize linear relationships, determined through linear regression. The following code snippet gives a simple example:

import numpy as np
import seaborn as sns
x = np.arange(100)
# normal distribution with mean 0 and a standard deviation of 5
y = x + np.random.normal(0, 5, size=100) 
sns.regplot(x, y)

The regplot() function draws a scatter plot, a regression line, and a 95% confidence interval for that regression, as shown in...

Squarify

At this point, we will briefly talk about tree maps. Tree maps display hierarchical data as a set of nested rectangles. Each group is represented by a rectangle, of which its area is proportional to its value. Using color schemes, it is possible to represent hierarchies (groups, subgroups, and so on). Compared to pie charts, tree maps use space efficiently. Matplotlib and Seaborn do not offer tree maps, and so the Squarify library that is built on top of Matplotlib is used. Seaborn is a great addition for creating color palettes.

Note

To install Squarify, first launch the command prompt from the Anaconda Navigator. Then, execute the following command: pip install squarify.

The following code snippet is a basic tree map example. It requires the squarify library:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import squarify
colors = sns.light_palette("brown", 4)
squarify.plot(sizes=[50, 25, 10, 15], \
    ...

Summary

In this chapter, we demonstrated how Seaborn helps to create visually appealing figures. We discussed various options for controlling Figure aesthetics, such as Figure style, controlling spines, and setting the context of visualizations. We talked about color palettes in detail. Further visualizations were introduced for univariate and bivariate distributions. Moreover, we discussed FacetGrids for creating multi-plots, and regression plots as a way to analyze the relationships between two variables. Finally, we discussed the Squarify library, which is used to create tree maps.

In the next chapter, we will work with a different category of data, called geospatial data. The prominent attribute of such a dataset is the presence of geo-coordinates that can be used to plot elements on a given position on a map. We will visualize poaching points, the density of cities around the world, and create a more interactive visualization that only displays data points of the currently...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Data Visualization Workshop
Published in: Jul 2020Publisher: PacktISBN-13: 9781800568846
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Mario Döbler

Mario Döbler is a Ph.D. student with a focus on deep learning at the University of Stuttgart. He previously interned at the Bosch Center for artificial intelligence in the Silicon Valley in the field of deep learning. He used state-of-the-art algorithms to develop cutting-edge products. In his master thesis, he dedicated himself to applying deep learning to medical data to drive medical applications.
Read more about Mario Döbler

author image
Tim Großmann

Tim Großmann is a computer scientist with interest in diverse topics, ranging from AI and IoT to Security. He previously worked in the field of big data engineering at the Bosch Center for Artificial Intelligence in Silicon Valley. In addition to that, he worked on an Eclipse project for IoT device abstractions in Singapore. He's highly involved in several open-source projects and actively speaks at tech meetups and conferences about his projects and experiences.
Read more about Tim Großmann