You're reading from The Pandas Workshop

Product typeBook

Published inJun 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781800208933

Edition1st Edition

Languages

Python

Tools

NumPy Pandas

Concepts

Data Science

Authors (4):

Blaine Bateman

Saikat Basak

Thomas V. Joseph

William So

View More author details

Chapter 8: Understanding Data Visualization

In the previous chapter, you were introduced to data transformation methods in pandas. In this chapter, you will learn more about data visualization in pandas and use different types of charts such as line, bar, pie, scatter, and box to perform exploratory data analysis. In this chapter, we shall also touch upon different ways you can plot these charts using the plot() function by pandas and matplotlib. We will learn the differences between these two methods and learn which one to use, depending on the desired outcome. The plots that we are going to learn about in this chapter will help us analyze our data to find out useful insights, such as the distribution of certain features over the population using histograms and finding outliers using boxplots. By the end of this chapter, you will know how to select the best chart type for your data, build it, and customize it for the purpose of your analysis.

This chapter consists of the following...

Introduction to data visualization

Humans can process a large amount of information using their sense of vision. Data visualization utilizes humans' innate skills to enhance the efficiency of data processing and organization. A classic visualization process starts by filtering data, transforming it into visual forms, and eventually displaying the data interactively to end users. With data visualization, users find it easier to understand and interpret the meaning of the underlying data. Good data visualization helps identify patterns, trends, and extreme values in a concise presentation. This is important in every aspect, especially when the data is big in volume or highly complex. Making sense of a large amount of data in a small amount of time is a huge business value.

pandas offers various options for visualizing data. To ensure your visualizations are accurate and that they correctly convey the insights gained from the underlying data, it is critical to identify and clean...

Understanding the basics of pandas visualization

pandas has built-in plot generation capabilities that can be used to visualize both DataFrames and series alike. pandas comes with a built-in plot function that acts as a wrapper on top of the matplotlib plot function. This means that pandas is actually using the matplotlib library but with a simplified syntax. This presents the advantage of being much easier to use (less code and simpler syntax) compared to matplotlib. It provides a wide range of functionality and flexibility to plot data analytics charts with given data.

To start off using pandas in-built visualizations, you will need to know several key parameters for the .plot() function, which can be called from a DataFrame. Some of these are listed as follows:

kind: This is the type of plot (bar, barh, pie, scatter, kde, and so on).
color: This is the color of the plot.
linestyle: This is the style of the line used in the plot (solid, dotted, and dashed).

Exploring matplotlib

Matplotlib is one of the most frequently used Python libraries. It can generate plotting diagrams with great flexibility. The pandas plot() function is a wrapper on top of matplotlib with some bare minimum functionality. While it does simplify the syntax, it also restrains the numerous possibilities of matplotlib. If you want to build complex visualizations, then matplotlib will be your best choice, as it allows controls over all kinds of properties, such as the size, the type of figures and markers, the line width, the colors, and the styles. We will see some of the customizations that can be easily done with matplotlib compared to pandas:

Let's start with an example. Consider the following snippet:

# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
 
# Defining a DataFrame
data_frame = pd.DataFrame({
'Year':['2010','2011','2012','2013','2014',&apos...

Visualizing data of different types

In the previous section, we saw how to use pandas and matplotlib to create charts for data visualization. In a data analytics project, data visualization can be used either for data analysis or to communicate insights. Presenting results in a visual way that stakeholders can easily understand and interpret is definitely a must-have skill for any good data analyst. However, you cannot choose any random chart or plot to visualize all of the different types of data that an analyst may encounter. Different chart or plot types are suitable for communicating the insight for different types of data – that is, when communicating the reach of social media on different age groups, it is preferable to use a pie chart instead of a bar or a box. On the other hand, line plots are more suitable for visualizing gradual change. The trick of data visualization is to know exactly which type of plot is appropriate for each data type you will encounter. This is...

Activity 8.01 – Using data visualization for exploratory data analysis

In this activity, we will apply what we have learned in this chapter to building different types of plots in order to perform an exploratory data analysis on a sale price. We will work on the Manufactured Housing Survey dataset, published by the United States Census Bureau, that can be found in the GitHub repository at https://raw.githubusercontent.com/PacktWorkshops/The-pandas-Workshop/master/Chapter08/Data/PUF2020final_v1coll.csv.

Note

More details about the Ames Housing dataset can be found at https://www.census.gov/data/datasets/2020/econ/mhs/puf.html.

The goal of this activity is to analyze the different factors contributing to a sale price in the housing market. We will use different types of plots in order to achieve it.

Your tasks will be as follows:

Open a Jupyter notebook.
Import the pandas, numpy, and matplotlib packages.
Load the CSV file as a DataFrame.
For the...

Summary

In this chapter, we have learned the fundamentals of pandas visualization and how to create charts. After going through the basics of creating charts in pandas, we looked at how we can further customize charts by using the matplotlib package. Then, we learned what the main charts are for each type of data, such as numerical data, categorical data, and statistical data, before learning how to handle multiple data plots.

Finally, we applied our learnings to an activity with the purpose of applying what we learned in this chapter to a business case, where the goal was to determine how different factors affect a price. In the next chapter, you will learn how to model data to derive insights.

The rest of the chapter is locked

You have been reading a chapter from

The Pandas Workshop

Published in: Jun 2022Publisher: PacktISBN-13: 9781800208933

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (4)

Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages