Packt+ | Advance your knowledge in tech

You're reading from Matplotlib for Python Developers. - Second Edition

Product typeBook

Published inApr 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781788625173

Edition2nd Edition

Languages

Python

Tools

Matplotlib

Concepts

Data Visualization

Authors (3):

Aldrin Yim

Claire Chung

Allen Yu

View More author details

Chapter 10. Integrating Data Visualization into the Workflow

We have now come to the concluding chapter of this book. Throughout the course of this book, you have mastered the techniques to create and customize static and animated plots using real-world data in different formats scraped from the web. To wrap up, we will start a mini-project in this chapter to combine the skills of data analytics with the visualization techniques you've learned. We will demonstrate how to integrate visualization techniques in your current workflow.

In the era of big data, machine learning becomes fundamental to ease analytic work by replacing huge amounts of manual curation with automatic prediction. Yet, before we enter model building, Exploratory Data Analysis (EDA) is always essential to get a good grasp of what the data is like. Constant review during the optimization process also helps improve our training strategy and results.

High-dimensional data typically requires special processing techniques to be...

Getting started

Recall the MNIST dataset we briefly touched upon in Chapter 04, Advanced Matplotlib. It contains 70,000 images of handwritten digits, often used in data mining tutorials as Machine Learning 101. We will continue using a similar image dataset of handwritten digits for our project in this chapter.

We are almost certain that you had already heard about the popular keywords—deep learning or machine learning in general—before starting with this course. That's why we are choosing it as our showcase. As detailed concepts in machine learning, such as hyperparameter tuning to optimize performance, are beyond the scope of this book, we will not go into them. But we will cover the model training part in a cookbook style. We will focus on how visualization helps our workflow. For those of you interested in the details of machine learning, we recommend exploring further resources that are largely available online.

Visualizing sample images from the dataset

Data cleaning and EDA are indispensable components of data science. Before we begin analyzing our data, it is important to understand some basic properties of what we have input. The dataset we are using comprises standardized images with regular shapes and normalized pixel values. The features are simple, thin lines. Our goal is straightforward as well, to recognize digits from images. Yet, in many cases of real-world practice, the problems can be more complicated; the data we collect is going to be raw and often much more heterogeneous. Before tackling the problem, it is usually worth the time to sample a small amount of input data for inspection. Imagine training a model to recognize Ramen just to get you drooling ;). You will probably take a look at some images to decide what features make a good input sample to exemplify the presence of the bowl. Besides the initial preparatory phase, during model building taking out some of the mislabeled...

Exploring the data nature by the t-SNE method

After visualizing a few images and glimpsing of how the samples are distributed, we will go deeper into our EDA.

Each pixel comes with an intensity value, which makes 64 variables for each 8x8 image. The human brain is not good at intuitively perceiving dimensions higher than three. For high-dimensional data, we need more effective visual aids.

Dimensionality reduction methods, such as the commonly used PCA and t-SNE, reduce the number of input variables under consideration, while retaining most of the useful information. As a result, the visualization of data becomes more intuitive.

In the following section, we will focus our discussion on the t-SNE method by using the scikit-learn library in Python.

Understanding t-Distributed stochastic neighbor embedding

The t-SNE method was proposed by van der Maaten and Hinton in 2008 in the publication Visualizing Data using t-SNE. It is a nonlinear dimension reduction method that aims to effectively visualize...

Creating a CNN to recognize digits

In the following section, we will use Keras. Keras is a Python library for neural networks and provides a high-level interface to TensorFlow libraries. We do not intend to give a complete tutorial on Keras or CNN, but we want to show how we can use Matplotlib to visualize the loss function, accuracy, and outliers of the results.

Readers who are not familiar with machine learning should be able to go through the logic of the remaining chapter and hopefully understand why visualizing the loss function, accuracy, and outliers of the results is important in fine-tuning the CNN model.

Here is a snippet of code for the CNN; the most important part is the evaluation section after this!

# Import sklearn models for preprocessing input data
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import LabelBinarizer

# Import the necessary Keras libraries
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten...

Evaluating prediction results with visualizations

We have specified the callbacks that store the loss and accuracy information for each epoch to be saved as the variable history. We can retrieve this data from the dictionary history.history. Let's check out the dictionary keys:

print(history.history.keys())

This will output dict_keys(['loss', 'acc']).

Next, we will plot out the loss function and accuracy along epochs in line graphs:

import pandas as pd
import matplotlib
matplotlib.style.use('seaborn')

# Here plots the loss function graph along Epochs
pd.DataFrame(history.history['loss']).plot()
plt.legend([])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Validation loss across 100 epochs',fontsize=20,fontweight='bold')
plt.show()

# Here plots the percentage of accuracy along Epochs
pd.DataFrame(history.history['acc']).plot()
plt.legend([])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Accuracy loss across 100 epochs',fontsize=20,fontweight='bold')
plt.show()

Upon training, we can...

Summary

Congratulations! You have now completed this chapter as well as the whole book. In this chapter, we integrated various data visualization techniques along with an analytic project workflow, from the initial inspection and exploratory analysis of data, to model building and evaluation. Give yourself a huge round of applause, and get ready to leap forward into the journey of data science!

The rest of the chapter is locked

You have been reading a chapter from

Matplotlib for Python Developers. - Second Edition

Published in: Apr 2018Publisher: PacktISBN-13: 9781788625173

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Aldrin Yim

Aldrin Yim is a PhD candidate and Markey Scholar in the Computation and System Biology program at Washington University, School of Medicine. His research focuses on applying big data analytics and machine learning approaches in studying neurological diseases and cancer. He is also the founding CEO of Codex Genetics Limited, which provides precision medicine solutions to patients and hospitals in Asia.
Read more about Aldrin Yim

Claire Chung

Claire Chung is pursuing her PhD degree as a Bioinformatician at the Chinese University of Hong Kong. She enjoys using Python daily for work and lifehack. While passionate in science, her challenge-loving character motivates her to go beyond data analytics. She has participated in web development projects, as well as developed skills in graphic design and multilingual translation. She led the Campus Network Support Team in college, and shared her experience in data visualization in PyCon HK 2017.
Read more about Claire Chung

Allen Yu

Allen Yu, PhD, is a Chevening Scholar, 2017-18, and an MSC student in computer science at the University of Oxford. He holds a PhD degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of bioinformatics experience.
Read more about Allen Yu

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages