Reader small image

You're reading from  Interactive Dashboards and Data Apps with Plotly and Dash

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800568914
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Elias Dabbas
Elias Dabbas
author image
Elias Dabbas

Elias Dabbas is an online marketing and data science practitioner. He produces open-source software for building dashboards, data apps, as well as software for online marketing, with a focus on SEO, SEM, crawling, and text analysis.
Read more about Elias Dabbas

Right arrow

Chapter 8: Calculating the Frequency of Your Data with Histograms and Building Interactive Tables

All the chart types that we've explored so far displayed our data as is. In other words, every marker, whether it was a circle, a bar, a map, or any other shape, corresponded to a single data point in our dataset. Histograms, on the other hand, display bars that correspond to a summary statistic about groups of data points. A histogram is mainly used to count values in a dataset. It does so by grouping, or "binning," the data into bins and displaying the count of observations in each bin. Other functions are possible, of course, such as working out the mean or maximum, but counting is the typical use case.

The counts are represented like a bar chart, where the heights of the bars correspond to the counts (or other function) of each bin. Another important result is that we also see how data is distributed, and what shape/kind of distribution we have. Are the observations...

Technical requirements

We will use similar tools to the ones we used in the previous chapter with one addition. We will be using Plotly Express as well as the graph_objects module for creating our charts. The packages to use are Plotly, Dash, Dash Core Component, Dash HTML Components, Dash Bootstrap Components, pandas, and the new dash_table package. You don't need to install this separately (although you can), as it is installed together with Dash when you install it.

The code files of this chapter can be found on GitHub at https://github.com/PacktPublishing/Interactive-Dashboards-and-Data-Apps-with-Plotly-and-Dash/tree/master/chapter_08.

Check out the following video to see the Code in Action at https://bit.ly/3sGSCes.

Creating a histogram

We want to see how we can get the distribution of a sample of data and get an idea of where values are concentrated, as well as how much variability/spread it has. We will do this by creating a histogram.

As always, we'll start with the simplest possible example:

  1. We open the poverty DataFrame and create a subset of it, containing only countries and data from the year 2015:
    import pandas as pd
    poverty = pd.read_csv('data/poverty.csv')
    df = poverty[poverty['is_country'] & poverty['year'].eq(2015)]
  2. Import Plotly Express and run the histogram function with df as the argument to the data_frame parameter and the indicator of our choice for the x parameter:
    import plotly.express as px
    gini = 'GINI index (World Bank estimate)'
    px.histogram(data_frame=df, x=gini)

    As a result, we get the histogram that you can see in Figure 8.1:

Figure 8.1 – A histogram of the Gini indicator

Figure 8.1 – A histogram of the Gini indicator...

Customizing the histogram by modifying its bins and using multiple histograms

We can change the number of bins through the nbins parameter. We will first see the effect of using two extreme values for the number of bins. Setting nbins=2 generates the chart in Figure 8.2:

Figure 8.2 – A histogram of the Gini indicator with two bins

Figure 8.2 – A histogram of the Gini indicator with two bins

As you can see, the values were split into two equal bins, (20, 39.9) and (40, 59.9), and we can see how many countries are in each bin. It's quite simple and easy to understand, but not as nuanced as the histogram in Figure 8.1. On the other hand, setting nbins=500 produces the chart in Figure 8.3:

Figure 8.3 – A histogram of the Gini indicator with 500 bins

Figure 8.3 – A histogram of the Gini indicator with 500 bins

It is now much more detailed, maybe more detailed than useful. When you set too many bins, it is almost like looking at the raw data.

The default number of bins resulted in the bin size being intervals of five. Now...

Adding interactivity to histograms

Just like we did in Chapter 7, Exploring Map Plots and Enriching Your Dashboards with Markdown, we can do the same with histograms. We can allow users to get a better idea about the distribution of a certain indicator in a certain year or more. The difference is that we want to allow them to customize the number of bins. Since we are now comfortable with handling multiple inputs and outputs, let's also add some more options for our users. We can also allow users to select multiple years and display multiple years on multiple sub-plots using faceting. Figure 8.10 shows what we will be working toward to make it clear:

Figure 8.10 – A histogram app allowing the selection of indicator, year(s), and bins

Figure 8.10 – A histogram app allowing the selection of indicator, year(s), and bins

Let's start building right away. We won't be discussing the layout elements such as color and width, but you can always refer to the code repository for the exact solution. We will focus on building the...

Creating a 2D histogram

In the first case, we basically counted the observations in each bin of our dataset. In this case, we will do the same but for combinations of bins for both datasets. The bins for each variable will end up creating a matrix. A simple example can make this easy. Let's create one and see:

  1. Create a subset of poverty containing only countries, where the year is equal to 2000:
    df = poverty[poverty['year'].eq(2000) & poverty['is_country']]
  2. Create a Figure object and add a histogram2d trace (at the time of writing, this chart type is not available in Plotly Express). We simply select any two indicators that we would like to plot together and pass them to x and y:
    fig = go.Figure()
    fig.add_histogram2d(x=df['Income share held by fourth 20%'],
                        y=df['GINI index (World Bank estimate)'],
     ...

Creating a DataTable

Technically, dash_table is a separate package, as mentioned at the beginning of the chapter, and can be installed separately. It is installed automatically with Dash, the correct, up-to-date version, which is the recommended approach.

Many times, displaying tables, especially if they are interactive, can add a lot of value to users of our dashboards. Also, if our dashboards or data visualizations are not sufficient for users, or if they want to run their own analysis, it is probably a good idea to allow them to get the raw data for that. Finally, the DataTable component allows its own data visualization through custom coloring, fonts, sizes, and so on. So, we have another way to visualize and understand our data through tables. We will explore a few options in this chapter, but definitely not all of them.

Let's see how we can create a simple DataTable in a simple app using a DataFrame:

  1. Create a subset of poverty containing only countries, from...

Controlling the look and feel of the table (cell width, height, text display, and more)

There are numerous options available to modify how your tables look, and it's always good to consult the documentation for ideas and solutions. The potentially tricky part is when you have combinations of options. In some cases, these might modify each other and not be displayed exactly the way you want. So, it is always good to isolate the options as much as possible when debugging.

In Figure 8.13, we displayed only three columns and the first few rows. We will now see how to display more columns and enable users to explore more rows:

  1. Modify df to include all columns that contain Income share:
    df = poverty[poverty['year'].eq(2000)&poverty['is_country']].filter(regex='Country Name|Income share')
  2. Place the DataTable in a dbc.Col component with the desired width, 7 in this case. The table automatically takes the width of the container it is in,...

Adding histograms and tables to the app

We are now ready to incorporate the table functionality into our app and add it to the callback function that we already created. What we will do is display the data that is used to generate the histograms right under the histogram figure. Since the histograms don't show data points as we discussed (only aggregates), it might be interesting for users to see for themselves if they wish.

Let's add this functionality right away:

  1. Add a new div right underneath the histogram figure:
    html.Div(id='table_histogram_output')
  2. Add this as an Output to the callback function:
    @app.callback(Output('indicator_year_histogram', 'figure'),
                  Output('table_histogram_output', 'children'),
                  Input('hist_multi_year_selector&apos...

Summary

In this chapter, we first learned about the main difference between histograms and the other types of charts we have covered so far. We saw how easy it is to create them, and more importantly, we saw how customizable they can be with bins, barmode, colors, and facets. We then explored how to add interactivity to histograms by connecting them to other components with a callback function.

We then explored the 2D histogram and saw how it can provide an even richer view of two columns visualized against each other.

We introduced a new interactive component, the DataTable. We barely scratched the surface of what can be done with tables. We used them to make it easier for users to obtain, interact with, or simply view the raw data behind our histograms. We also explored the different ways to control the look and feel of our tables.

Finally, we incorporated the table functionality with the callback function we created and added the interactivity to our app.

Let's...

What we have covered so far

In the first part of the book, we covered the basics of Dash apps. We first explored how they are structured and how to manage the visual elements. Then, we explored how interactivity is created, which is mainly by using callback functions. This allowed us to create fully interactive apps. We then explored the structure of the Figure object and learned how to modify and manipulate it to generate the charts we desire. After that, we saw how important data manipulation and preparation are for data visualization. We went through a reshaping of our dataset, to make things more intuitive to work with. This paved the way for easily learning and using Plotly Express.

Part 2 was about getting thoroughly familiar with several types of charts, as well as interactive components. We implemented all the knowledge we built in Part 1, but most importantly, we did this in a practical setting. We gradually added more and more charts, components, and functionality to one...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Interactive Dashboards and Data Apps with Plotly and Dash
Published in: May 2021Publisher: PacktISBN-13: 9781800568914
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Elias Dabbas

Elias Dabbas is an online marketing and data science practitioner. He produces open-source software for building dashboards, data apps, as well as software for online marketing, with a focus on SEO, SEM, crawling, and text analysis.
Read more about Elias Dabbas