You're reading from Interactive Dashboards and Data Apps with Plotly and Dash

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800568914

Edition1st Edition

Languages

Python

Tools

Plotly

Concepts

Data Visualization

Author (1)

Elias Dabbas

Chapter 9: Letting Your Data Speak for Itself with Machine Learning

While making histograms we got a glimpse of a technique that visualizes aggregates, and not data points directly. In other words, we visualized data about our data. We will take this concept several steps further in this chapter, by using a machine learning technique to demonstrate some options that can be used to categorize or cluster our data. As you will see in this chapter, and even while using a single technique, there are numerous options and combinations of options that can be explored. This is where the value of interactive dashboards comes into play. It would be very tedious if users were to explore every single option by manually creating a chart for it.

This chapter is not an introduction to machine learning, nor does it assume any prior knowledge of it. We will explore a clustering technique called KMeans clustering and use the sklearn machine learning package. This will help us in grouping our data...

Technical requirements

We will be exploring a few options from sklearn, as well as NumPy. Otherwise, we will be using the same tools we have been using. For visualization and building interactivity, Dash, JupyterDash, the Dash Core Component library, Dash HTML Components, Dash Bootstrap Components, Plotly, and Plotly Express will be used. For data manipulation and preparation, we will use pandas and NumPy. JupyterLab will be used for exploring and building independent functionality. Finally, sklearn will be used for building our machine learning models, as well as for preparing our data.

The code files of this chapter can be found on GitHub at https://github.com/PacktPublishing/Interactive-Dashboards-and-Data-Apps-with-Plotly-and-Dash/tree/master/chapter_09.

Check out the following video to see the Code in Action at https://bit.ly/3x8PAmt.

Understanding clustering

So, what exactly is clustering and when might it be helpful? Let's start with a very simple example. Imagine you have a group of people for whom we want to make T-shirts. We can make a T-shirt for each one of them, in whatever size required. The main restriction is that we can only make one size. The sizes are as follows: [1, 2, 3, 4, 5, 7, 9, 11]. Think how you might tackle this problem. We will use the KMeans algorithm for that, so let's start right away, as follows:

Import the required packages and models. NumPy will be imported as a package, but from sklearn we will import the only model that we will be using for now, as illustrated in the following code snippet:
```
import numpy as np
from sklearn.cluster import KMeans
```
Create a dataset of sizes in the required format. Note that each observation (person's size) should be represented as a list, so we use the reshape method of NumPy arrays to get the data in the required format, as follows...

Finding the optimal number of clusters

We will now see the options we have in choosing the optimal number of clusters and what that entails, but let's first take a look at the following screenshot to visualize how things progress from having one cluster to eight clusters:

Figure 9.3 – Data points and cluster centers for all possible cluster numbers

We can see the full spectrum of possible clusters and how they relate to data points. At the end, when we specified 8, we got the perfect solution, where every data point is a cluster center.

In reality, you might not want to go for the full solution, for two main reasons. Firstly, it is probably going to be prohibitive from a cost perspective. Imagine making 1,000 T-shirts with a few hundred sizes. Secondly, in practical situations, it usually wouldn't add much value to add more clusters after a certain fit has been achieved. Using our T-shirt example, imagine if we have two people with...

Clustering countries by population

We will first understand this with one indicator that we are familiar with (population), and then make it interactive. We will cluster groups of countries based on their population.

Let's start with a possible practical situation. Imagine you were asked to group countries by population. You are supposed to have two groups of countries, of high and low populations. How do you do that? Where do you draw the line(s), and what does the total of the population have to be in order for it to qualify as "high"? Imagine that you were then asked to group countries into three or four groups based on their population. How would you update your clusters?

We can easily see how KMeans clustering is ideal for that.

Let's now do the same exercise with KMeans using one dimension, and then combine that with our knowledge of mapping, as follows:

Import pandas and open the poverty dataset, like this:
```
import pandas as pd
poverty = pd...
```

Preparing data with scikit-learn

scikit-learn is one of the most widely used and comprehensive machine learning libraries in Python. It plays very well with the rest of the data-science ecosystem libraries, such as NumPy, pandas, and matplotlib. We will be using it for modeling our data and for some preprocessing as well.

We now have two issues that we need to tackle first: missing values and scaling data. Let's see two simple examples for each, and then tackle them in our dataset. Let's start with missing values.

Handling missing values

Models need data, and they can't know what to do with a set of numbers containing missing values. In such cases (and there are many in our dataset), we need to make a decision on what to do with those missing values.

There are several options, and the right choice depends on the application as well as the nature of the data, but we won't get into those details. For simplicity, we will make a generic choice of replacing...

Creating an interactive KMeans clustering app

Let's now put everything together and make an interactive clustering application using our dataset. We will give users the option to choose the year, as well as the indicator(s) that they want. They can also select the number of clusters and get a visual representation of those clusters, in the form of a colored choropleth map, based on the discovered clusters.

Please note that it is challenging to interpret such results with multiple indicators because we will be handling more than one dimension. It can also be difficult if you are not an economist and don't know which indicators make sense to be checked with which other indicators, and so on.

The following screenshot shows what we will be working toward:

Figure 9.9 – An interactive KMeans clustering application

As you can see, this is a fairly rich application in terms of the combinations of options that it provides. As I also mentioned...

Summary

We first got an idea of how clustering works. We built the simplest possible model for a tiny dataset. We ran the model a few times and evaluated the performance and outcomes for each of the numbers of clusters that we chose.

We then explored the elbow technique to evaluate different clusters and saw how we might discover the point of diminishing returns, where not much improvement is achieved by adding new clusters. With that knowledge, we used the same technique for clustering countries by a metric with which most of us are familiar and got firsthand experience in how it might work on real data.

After that, we planned an interactive KMeans app and explored two techniques for preparing data before running our model. We mainly explored imputing missing values and scaling data.

This gave us enough knowledge to get our data in a suitable format for us to create our interactive app, which we did at the end of the chapter.

We next explored advanced features of Dash...

The rest of the chapter is locked

You have been reading a chapter from

Interactive Dashboards and Data Apps with Plotly and Dash

Published in: May 2021Publisher: PacktISBN-13: 9781800568914

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Elias Dabbas

Elias Dabbas is an online marketing and data science practitioner. He produces open-source software for building dashboards, data apps, as well as software for online marketing, with a focus on SEO, SEM, crawling, and text analysis.
Read more about Elias Dabbas

Other recommended products

Related to this chapter

Applied Data Visualization with R and ggplot2

When data is presented to you in a graphical or pictorial format, you can analyze it more effectively. This book begins by introducing you to basic concepts, such as grammar of graphics and geometric objects. It then goes on to explain these concepts in detail with examples. Once you are comfortable with basics, you can learn all about the advanced plotting techniques, such as box plots and density plots. With this book, you can transform data into useful material and make data analysis interesting and fun.

BookSep 2018140 pages

Interactive Data Visualization with Python

Interactive Data Visualization with Python sharpens your data exploration skills, tells you everything there is to know about interactive data visualization in Python, and most importantly, helps you make your storytelling more intuitive and persuasive.

BookOct 2019362 pages

Interactive Data Visualization with Python

Interactive Data Visualization with Python sharpens your data exploration skills, tells you everything there is to know about interactive data visualization in Python, and most importantly, helps you make your storytelling more intuitive and persuasive.

BookApr 2020362 pages

Hands-On Predictive Analytics with Python

This book will teach you all the processes you need to build a predictive analytics solution: understanding the problem, preparing datasets, exploring relationships, model building, tuning, evaluation, and deployment. You'll earn to use Python and its data analytics ecosystem to implement the main techniques used in real-world projects.

BookDec 2018330 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages