Reader small image

You're reading from  Apache Superset Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
Publisher
ISBN-139781788992244
Edition1st Edition
Languages
Right arrow
Author (1)
Shashank Shekhar
Shashank Shekhar
author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Right arrow

Distribution – histogram

After uploading the file as a table, open it for visualization and select the Histogram option. Make sure that start_date is selected as Time Column. The Time window defined between Since and Until must be large enough to include all the books, because we do not want to do any Time window-specific analysis.

Page count is an important feature in the dataset, where each row is a book. It is a numerical value. So, to begin with let's look at a distribution plot of page counts. It will give us a sense of the variance in the feature value:

Data form for a histogram chart

The number of bins in a histogram limits the granularity of questions we can answer about the variance of the feature:

Distribution plot of page counts

Because we have set five bins, what is identifiable is that about 41-42 out of 93 books (approx. 44%-45%) have page counts of...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018Publisher: ISBN-13: 9781788992244

Author (1)

author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar