You're reading from Apache Superset Quick Start Guide

Product typeBook

Published inDec 2018

Reading LevelIntermediate

Publisher

ISBN-139781788992244

Edition1st Edition

Languages

Python

Tools

Apache Superset

Concepts

Business Intelligence

Author (1)

Shashank Shekhar

Comparing Feature Values

Given a table with many columns, an understanding of the range and simple statistics of the feature values in every column often results in an individual becoming curious about how different features affect one another. Relationships between features are modeled as correlation measures. Formulating and computing correlations between features in a dataset is a complex problem. Sometimes, joint distribution plots are able to encapsulate and visualize these relationships very well.

We can visualize multiple features for every row at once as points on a chart. The bubble chart in Superset can be used to visualize a feature type on the y axis perpendicular to the x axis timeline. A second feature is color-coded, and a third feature value is reflected as bubble size in a group of one or more rows in a dataset. In this chapter, we will make the following charts...

Dataset

We will be working with trading data on commodities in this chapter. The Federal Reserve Bank of St Louis, United States, compiles data on commodities. Datasets are available on http://fred.stlouisfed.org. You can obtain time series data on import values and import volumes of commodities traded by the United States. We will download data on bananas, olive oil, sugar, uranium, cotton, oranges, wheat, aluminium, iron, and corn.

Inside the chapter directory of the GitHub repository, you will find the generate_dataset.ipynb Jupyter Notebook. Just run the Notebook to download, transform, and generate the two CSV files we will upload. If you want to skip running the Notebook, the two CSV files, fsb_st_louis_commodities.csv and usda_oranges_and_bananas_data.csv, are also present in the repository, ready for upload.

The FSB data on commodity prices in fsb_st_louis_commodities...

Comparing multiple time series

The time series line chart is useful for visualizing the price trends for every type of commodity together. Using the first dataset that was uploaded, we will visualize prices of commodities over time on the x axis and see how they compare against each other, as follows:

Setting the parameters for the time series chart

Remember to clear the time thresholds in the Time section. Then, select feature as the Group by value, AVG(value) as Metrics, and render the graph:

The time series line chart for all values

The tooltip shows the y axis price values for each commodity type and the units used. We can notice that the highly priced commodities have mostly non-overlapping price ranges. The data extends from January 1980 to June 2018. After the expensive commodities, bananas and oranges have fairly overlapping price ranges. It will be easier to compare...

Comparing two time series

Stacked charts are often useful for measuring the combined area covered and relative differences in y axis values for two or more series. We will use the time series stacked chart to compare the prices of oranges and bananas:

Setting parameters for the time series stacked chart

The Style section of the chart provides a stream style option. The width of each stream is proportional to the value in that category:

Time series stacked chart

In the stacked chart, the increase in price of both bananas and oranges is visualized through the increasing width of the stream. Since 2010, the color-coded streams show that oranges have had a relatively higher price variance than bananas. We can switch to expand styles and see whether, besides the higher price variation, oranges show a higher upward trend in prices:

Changing the variation

After switching to expand...

Identifying differences in trends for two feature values

Bananas are a year-round fruit. By comparison, oranges are harvested from December to June. Perhaps the seasonality of oranges has something to do with the higher price variation. The second dataset that we uploaded has values and volumes of oranges imported in different forms, such as fresh oranges, orange juice, and preserved oranges:

Running the query for extracting the data of oranges imported in different forms

In the SQL Editor inside SQL Lab, I wrote a query to list the different forms of oranges. We can focus on the effect of seasonality by only selecting fresh oranges and fresh bananas in subsequent charts.

We will make a bubble chart to compare the import value of oranges to bananas. Bubble charts also support visualization of a third data dimension using Bubble Size. Since we are interested in comparing the import...

Summary

With two datasets, we were able to compare the prices of food commodities. We then dived deep into a comparison of the imported prices of oranges and bananas in United States. We made use of five chart types that helped to give us a better understanding of how bananas correlate with respect to oranges, although we did not attempt to quantify the relationship between banana and orange import prices. Still, we were able to understand how they differed in a very significant way.

In the next chapter, we will visualize relationships as graphs instead of coordinates on orthogonal axes. This will help us to visualize features in a dataset connected in a network.

The rest of the chapter is locked

You have been reading a chapter from

Apache Superset Quick Start Guide

Published in: Dec 2018Publisher: ISBN-13: 9781788992244

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages