Reader small image

You're reading from  The Data Visualization Workshop

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800568846
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Mario Döbler
Mario Döbler
author image
Mario Döbler

Mario Döbler is a Ph.D. student with a focus on deep learning at the University of Stuttgart. He previously interned at the Bosch Center for artificial intelligence in the Silicon Valley in the field of deep learning. He used state-of-the-art algorithms to develop cutting-edge products. In his master thesis, he dedicated himself to applying deep learning to medical data to drive medical applications.
Read more about Mario Döbler

Tim Großmann
Tim Großmann
author image
Tim Großmann

Tim Großmann is a computer scientist with interest in diverse topics, ranging from AI and IoT to Security. He previously worked in the field of big data engineering at the Bosch Center for Artificial Intelligence in Silicon Valley. In addition to that, he worked on an Eclipse project for IoT device abstractions in Singapore. He's highly involved in several open-source projects and actively speaks at tech meetups and conferences about his projects and experiences.
Read more about Tim Großmann

View More author details
Right arrow

1. The Importance of Data Visualization and Data Exploration

Activity 1.01: Using NumPy to Compute the Mean, Median, Variance, and Standard Deviation of a Dataset

Solution:

  1. Import NumPy:
    import numpy as np
  2. Load the normal_distribution.csv dataset by using the genfromtxt method from NumPy:
    dataset = np.genfromtxt('../../Datasets/normal_distribution.csv', \
                            delimiter=',')
  3. First, print a subset of the first two rows of the dataset:
    dataset[0:2]

    The output of the preceding code is as follows:

    Figure 1.57: First two rows of the dataset

  4. Load the dataset and calculate the mean of the third row. Access the third row by using index 2, dataset[2]:
    np.mean(dataset[2])

    The output of the preceding code is as follows:

    100.20466135250001
  5. Index the last element of an ndarray in the same way a regular Python list can be...

2. All You Need to Know about Plots

Activity 2.01: Employee Skill Comparison

Solution:

  1. Bar charts and radar charts are great for comparing multiple variables for multiple groups.
  2. Suggested response: The bar chart is great for comparing the skill attributes of the different employees, but it is not the best choice when it comes to getting an overall impression of an employee, due to the fact that the skills are not displayed directly next to one another.

    The radar chart is great for this scenario because you can both compare performance across employees and directly observe the individual performance for each skill attribute.

  3. Suggested response:

    For both the bar and radar charts, adding a title and labels would help to understand the plots better. Additionally, using different colors for the different employees in the radar chart would help to keep the different employees apart.

Activity 2.02: Road Accidents Occurring over Two Decades

Solution:

  1. Suggested...

3. A Deep Dive into Matplotlib

Activity 3.01: Visualizing Stock Trends by Using a Line Plot

Solution:

Visualize a stock trend by using a line plot:

  1. Create an Activity3.01.ipynb Jupyter notebook in the Chapter03/Activity3.01 folder to implement this activity.
  2. Import the necessary modules and enable plotting within the Jupyter notebook:
    # Import statements
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    %matplotlib inline
  3. Use pandas to read the datasets (GOOGL_data.csv, FB_data.csv, AAPL_data.csv, AMZN_data.csv, and MSFT_data.csv) located in the Datasets folder. The read_csv() function reads a .csv file into a DataFrame:
    # load datasets
    google = pd.read_csv('../../Datasets/GOOGL_data.csv')
    facebook = pd.read_csv('../../Datasets/FB_data.csv')
    apple = pd.read_csv('../../Datasets/AAPL_data.csv')
    amazon = pd.read_csv('../../Datasets/AMZN_data.csv')
    microsoft = pd.read_csv('../../Datasets/MSFT_data.csv...

4. Simplifying Visualizations Using Seaborn

Activity 4.01: Using Heatmaps to Find Patterns in Flight Passengers' Data

Solution:

Find the patterns in the flight passengers' data with the help of a heatmap:

  1. Create an Activity4.01.ipynb Jupyter notebook in the Chapter04/Activity4.01 folder to implement this activity.
  2. Import the necessary modules and enable plotting within a Jupyter notebook:
    %matplotlib inline
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    sns.set()
  3. Use pandas to read the flight_details.csv dataset located in the Datasets folder. The given dataset contains the monthly figures for flight passengers for the years 1949 to 1960:
    data = pd.read_csv("../../Datasets/flight_details.csv")
  4. Now, we can use the pivot() function to transform the data into a format that is suitable for heatmaps:
    data = data.pivot("Months", "Years", "Passengers")
    data = data.reindex...

5. Plotting Geospatial Data

Activity 5.01: Plotting Geospatial Data on a Map

Solution:

Let's plot the geospatial data on a map and find the densely populated areas of cities in Europe that have population of more than 100,000:

  1. Create an Activity5.01.ipynb Jupyter notebook in the Chapter05/Activity5.01 folder to implement this activity and then import the necessary dependencies:
    import numpy as np
    import pandas as pd
    import geoplotlib
  2. Load the world_cities_pop.csv dataset from the Datasets folder using pandas:
    #loading the Dataset (make sure to have the dataset downloaded)
    dataset = pd.read_csv('../../Datasets/world_cities_pop.csv', \
                          dtype = {'Region': np.str})

    Note

    If we import our dataset without defining the dtype attribute of the Region column as a String type, we will get a warning telling us that...

6. Making Things Interactive with Bokeh

Activity 6.01: Plotting Mean Car Prices of Manufacturers

Solution:

  1. Create an Activity6.01.ipynb Jupyter notebook in the Chapter06/Activity6.01 folder.
  2. Import the necessary libraries:
    import pandas as pd
    from bokeh.io import output_notebook
    output_notebook()
  3. Load the automobiles.csv dataset from the Datasets folder:
    dataset = pd.read_csv('../../Datasets/automobiles.csv')
  4. Use the head method to print the first five rows of the dataset:
    dataset.head()

    The following figure shows the output of the preceding code:

Figure 6.36: Loading the top five rows of the automobile dataset

Plotting each car with its price

  1. Use the plotting interface of Bokeh to do some basic visualization first. Let's plot each car with its price. Import figure and show from the bokeh.plotting interface:
    from bokeh.plotting import figure, show
  2. First, use the index as our x-axis since we just want to plot each car with its price...

7. Combining What We Have Learned

Activity 7.01: Implementing Matplotlib and Seaborn on the New York City Database

Solution:

  1. Create an Activity7.01.ipynb Jupyter Notebook in the Chapter07/Activity7.01 folder to implement this activity. Import all the necessary libraries:
    # Import statements
    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib
    import matplotlib.pyplot as plt
    import squarify
    sns.set()
  2. Use pandas to read both CSV files located in the Datasets folder:
    p_ny = pd.read_csv('../../Datasets/acs2017/pny.csv')
    h_ny = pd.read_csv('../../Datasets/acs2017/hny.csv')
  3. Use the given PUMA (public use microdata area code based on the 2010 census definition, which are areas with populations of 100,000 or more) ranges to further divide the dataset into NYC districts (Bronx, Manhattan, Staten Island, Brooklyn, and Queens):
    # PUMA ranges
    bronx = [3701, 3710]
    manhatten = [3801, 3810]
    staten_island = [3901, 3903]
    brooklyn =...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Data Visualization Workshop
Published in: Jul 2020Publisher: PacktISBN-13: 9781800568846
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Mario Döbler

Mario Döbler is a Ph.D. student with a focus on deep learning at the University of Stuttgart. He previously interned at the Bosch Center for artificial intelligence in the Silicon Valley in the field of deep learning. He used state-of-the-art algorithms to develop cutting-edge products. In his master thesis, he dedicated himself to applying deep learning to medical data to drive medical applications.
Read more about Mario Döbler

author image
Tim Großmann

Tim Großmann is a computer scientist with interest in diverse topics, ranging from AI and IoT to Security. He previously worked in the field of big data engineering at the Bosch Center for Artificial Intelligence in Silicon Valley. In addition to that, he worked on an Eclipse project for IoT device abstractions in Singapore. He's highly involved in several open-source projects and actively speaks at tech meetups and conferences about his projects and experiences.
Read more about Tim Großmann