Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Apache Superset Quick Start Guide

You're reading from  Apache Superset Quick Start Guide

Product type Book
Published in Dec 2018
Publisher
ISBN-13 9781788992244
Pages 188 pages
Edition 1st Edition
Languages
Author (1):
Shashank Shekhar Shashank Shekhar
Profile icon Shashank Shekhar

Table of Contents (10) Chapters

Preface Getting Started with Data Exploration Configuring Superset and Using SQL Lab User Authentication and Permissions Visualizing Data in a Column Comparing Feature Values Drawing Connections between Entity Columns Mapping Data That Has Location Information Building Dashboards Other Books You May Enjoy

Comparing Feature Values

Given a table with many columns, an understanding of the range and simple statistics of the feature values in every column often results in an individual becoming curious about how different features affect one another. Relationships between features are modeled as correlation measures. Formulating and computing correlations between features in a dataset is a complex problem. Sometimes, joint distribution plots are able to encapsulate and visualize these relationships very well.

We can visualize multiple features for every row at once as points on a chart. The bubble chart in Superset can be used to visualize a feature type on the y axis perpendicular to the x axis timeline. A second feature is color-coded, and a third feature value is reflected as bubble size in a group of one or more rows in a dataset. In this chapter, we will make the following charts...

Dataset

We will be working with trading data on commodities in this chapter. The Federal Reserve Bank of St Louis, United States, compiles data on commodities. Datasets are available on http://fred.stlouisfed.org. You can obtain time series data on import values and import volumes of commodities traded by the United States. We will download data on bananas, olive oil, sugar, uranium, cotton, oranges, wheat, aluminium, iron, and corn.

Inside the chapter directory of the GitHub repository, you will find the generate_dataset.ipynb Jupyter Notebook. Just run the Notebook to download, transform, and generate the two CSV files we will upload. If you want to skip running the Notebook, the two CSV files, fsb_st_louis_commodities.csv and usda_oranges_and_bananas_data.csv, are also present in the repository, ready for upload.

The FSB data on commodity prices in fsb_st_louis_commodities...

Comparing multiple time series

The time series line chart is useful for visualizing the price trends for every type of commodity together. Using the first dataset that was uploaded, we will visualize prices of commodities over time on the x axis and see how they compare against each other, as follows:

Setting the parameters for the time series chart

Remember to clear the time thresholds in the Time section. Then, select feature as the Group by value, AVG(value) as Metrics, and render the graph:

The time series line chart for all values

The tooltip shows the y axis price values for each commodity type and the units used. We can notice that the highly priced commodities have mostly non-overlapping price ranges. The data extends from January 1980 to June 2018. After the expensive commodities, bananas and oranges have fairly overlapping price ranges. It will be easier to compare...

Comparing two time series

Stacked charts are often useful for measuring the combined area covered and relative differences in y axis values for two or more series. We will use the time series stacked chart to compare the prices of oranges and bananas:

Setting parameters for the time series stacked chart

The Style section of the chart provides a stream style option. The width of each stream is proportional to the value in that category:

Time series stacked chart

In the stacked chart, the increase in price of both bananas and oranges is visualized through the increasing width of the stream. Since 2010, the color-coded streams show that oranges have had a relatively higher price variance than bananas. We can switch to expand styles and see whether, besides the higher price variation, oranges show a higher upward trend in prices:

Changing the variation

After switching to expand...

Identifying differences in trends for two feature values

Bananas are a year-round fruit. By comparison, oranges are harvested from December to June. Perhaps the seasonality of oranges has something to do with the higher price variation. The second dataset that we uploaded has values and volumes of oranges imported in different forms, such as fresh oranges, orange juice, and preserved oranges:

Running the query for extracting the data of oranges imported in different forms

In the SQL Editor inside SQL Lab, I wrote a query to list the different forms of oranges. We can focus on the effect of seasonality by only selecting fresh oranges and fresh bananas in subsequent charts.

We will make a bubble chart to compare the import value of oranges to bananas. Bubble charts also support visualization of a third data dimension using Bubble Size. Since we are interested in comparing the import...

Summary

With two datasets, we were able to compare the prices of food commodities. We then dived deep into a comparison of the imported prices of oranges and bananas in United States. We made use of five chart types that helped to give us a better understanding of how bananas correlate with respect to oranges, although we did not attempt to quantify the relationship between banana and orange import prices. Still, we were able to understand how they differed in a very significant way.

In the next chapter, we will visualize relationships as graphs instead of coordinates on orthogonal axes. This will help us to visualize features in a dataset connected in a network.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018 Publisher: ISBN-13: 9781788992244
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}