Reader small image

You're reading from  Apache Superset Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
Publisher
ISBN-139781788992244
Edition1st Edition
Languages
Right arrow
Author (1)
Shashank Shekhar
Shashank Shekhar
author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Right arrow

Comparison – box plots for groups of feature values

The previous charts described the relationship between days taken to finish reading a book and page count. Now, we will try to understand the highest page counts in calendar months, where a book was finished after x number of reading days. In the first chart, we plotted the number of samples we have for each group of books, which were completed in the same number of days. There are multiple samples in many groups. Here, we will plot a distribution for multiple samples in each group.

We can define a statistic to summarize the average page counts of books completed in the same calendar month as the a book was completed after x number of days.

We will make a box plot chart as follows:


Parameters to set box plot chart

The data that we are visualizing in this box plot is made using multiple group by operations, because the...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018Publisher: ISBN-13: 9781788992244

Author (1)

author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar