Packt+ | Advance your knowledge in tech

You're reading from Matplotlib 2.x By Example

Product typeBook

Published inAug 2017

PublisherPackt

ISBN-139781788295260

Edition1st Edition

Tools

Matplotlib

Concepts

Data Visualization

Authors (3):

Allen Yu

Claire Chung

Aldrin Yim

View More author details

Chapter 8. Exploratory Data Analytics and Infographics

Let the data speak for themselves.

This is a well-known quote to many data scientists in the field. However, it is often not trivial to capture the hidden characteristics or features in big data, and some exploratory data analysis must be done before we fully understand the dataset.

In this chapter, we aim to perform some exploratory data analysis on two datasets, using the techniques that we have discussed in previous chapters. Here is a brief outline of this chapter:

Visualizing categorical data
Visualizing geographical data
GeoPandas library
Working with images using the PIL library
Importing/transforming images
Multiple subplots
Heatmap
Survival graph

We assume that the readers are now comfortable with the use of pandas DataFrame as it will be heavily used in this chapter.

Readers should also be noted that most exploratory data analyses actually involve a significant amount of statistics, including dimension reduction approaches such as PCA...

Visualizing population health information

The following section will be dedicated to combining both geographical and population health information of the US. Since this is a tutorial on Python, we focus more on ways to visualize the data, rather than to draw solid conclusions from it. However, many of the findings below actually concur with the population health research and news reports that one may find online.

To begin, let us first download the following information:

Top 10 leading causes of death in the United States from 1999 to 2013 from Healthdata.gov
2016 TIGER GeoDatabase from US Census Bureau
Survival data of various type of cancers from The Cancer Genome Atlas (TCGA) project (https://cancergenome.nih.gov/)

Since some of the information does not allow direct download through links, we have included the raw data in our code repository:

Top 10 leading causes of death in the United States from 1999-2013: https://www.healthdata.gov/dataset/nchs-age-adjusted-death-rates-top-10-leading-causes...

Survival data analysis on cancer

Since we've spent a significant amount of time discussing death rate, let us conclude this chapter with one final analysis of two cancer datasets. We have obtained the de-identified clinical dataset of breast cancer and brain tumor from http://www.cbioportal.org/; our goal is to see what the overall survival outcome looks like, and whether the two cancers are having statistically different survival outcomes. The datasets are being explored only for research purposes:

# The clinical dataset are in tsv format
# We can use the .read_csv() method and add an argument sep='\t'
# to construct the dataframe
gbm_df = pd.read_csv('https://github.com/PacktPublishing/Matplotlib-2.x-
By-Example/blob/master/gbm_tcga_clinical_data.tsv',sep='\t')
gbm_primary_df = gbm_df[gbm_df['Sample Type']=='Primary Tumor']
.dropna(subset=['Overall Survival (Months)'])

brca_df = pd.read_csv('https://github.com/PacktPublishing/Matplotlib-2.x-
By-Example/blob/master/brca_metabric_clinical_data...

Summary

In this chapter, we explored different ways of performing exploratory data analysis, specifically focusing on population health information. With all the code provided in this book, the readers can definitely combine more datasets and explore the hidden characteristics. For instance, one can explore whether illegal drug usage is correlated with suicide, or whether exercise is anti-correlated with heart disease across the USA. One key message is that the readers should not mix up association and causality, which is a frequent mistake even made by experienced data scientists. Hopefully, by now, the readers are getting more comfortable with data analysis using Python, and we, the authors, are looking forward to your contribution to the Python community.

Happy coding!

The rest of the chapter is locked

You have been reading a chapter from

Matplotlib 2.x By Example

Published in: Aug 2017Publisher: PacktISBN-13: 9781788295260

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Allen Yu

Allen Yu, PhD, is a Chevening Scholar, 2017-18, and an MSC student in computer science at the University of Oxford. He holds a PhD degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of bioinformatics experience.
Read more about Allen Yu

Claire Chung

Claire Chung is pursuing her PhD degree as a Bioinformatician at the Chinese University of Hong Kong. She enjoys using Python daily for work and lifehack. While passionate in science, her challenge-loving character motivates her to go beyond data analytics. She has participated in web development projects, as well as developed skills in graphic design and multilingual translation. She led the Campus Network Support Team in college, and shared her experience in data visualization in PyCon HK 2017.
Read more about Claire Chung

Aldrin Yim

Aldrin Yim is a PhD candidate and Markey Scholar in the Computation and System Biology program at Washington University, School of Medicine. His research focuses on applying big data analytics and machine learning approaches in studying neurological diseases and cancer. He is also the founding CEO of Codex Genetics Limited, which provides precision medicine solutions to patients and hospitals in Asia.
Read more about Aldrin Yim

Other recommended products

Related to this chapter

Matplotlib for Python Developers

This book is a useful resource to perform data visualization with Python using the latest version of Matplotlib (2.1.x). You will create a variety of graphs and charts, and embed your plots within different third party tools. By the end of the book, you will build attractive, insightful and powerful visualizations to make better sense of your data.

BookApr 2018300 pages

Mastering Matplotlib 2.x

Mastering Matplotlib covers the use cases and encounter unusual cases that requires more powerful tools. With the easy to follow examples and high end components of matplotlib, this book will enable to develop advanced and interactive plots using Python scripting and Matplotlib

BookNov 2018214 pages

Data Visualization with Python for Beginners

Utilizing tools and operations from several major libraries, this book will teach you to visualize data with Python comfortably and confidently in no time at all.

BookMar 2021280 pages

Matplotlib 3.0 Cookbook

This book presents highly practical, ready to implement recipes on using Python's Matplotlib package for effective data visualization. It contains quick solutions to the common and not-so-common problems encountered while designing different types of visualizations, including histograms, bar plots, and other advanced charts.

BookOct 2018676 pages

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

The Data Visualization Workshop

Cut through the noise and get real results with a step-by-step approach to learning data visualization with Python

BookFeb 2020480 pages

Data Visualization with Python

With so much data being continuously generated, developers with a knowledge of data analytics and data visualization are always in demand. With Data Visualization with Python, you'll learn how to use Python with NumPy, Pandas, Matplotlib, and Seaborn to create impactful data visualizations with a real world, public data.

BookFeb 2019368 pages

The Data Visualization Workshop

The Data Visualization Workshop will help you get started with data visualization, giving you the confidence to choose the best visualization technique to suit your needs. Fun activities and exercises featured throughout the book will keep you engaged as you build interactive visualizations with real data.

BookJul 2020536 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages