Reader small image

You're reading from  Apache Superset Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
Publisher
ISBN-139781788992244
Edition1st Edition
Languages
Right arrow
Author (1)
Shashank Shekhar
Shashank Shekhar
author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Right arrow

Comparison – relationship between feature values

Let's say we are curious about a trend where the time taken to read a book increases with respect to the page count.

Books often have a gripping effect on a reader, once they find them interesting. So, we cannot expect the number of pages to proportionately grow with the number of days taken to read a book, because books that the reader finds gripping will be read at a faster pace than others. In any dataset, there are samples that are noisy and hard to explain. In this dataset, we will find that some books with lower page counts take more days to finish than books with higher page counts.

It will be useful to look at the number of samples we have available for each group, defined by number of reading days. Select COUNT(*) as Metrics to plot the number_of_books read:

Defining the reading capacity for each group of...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018Publisher: ISBN-13: 9781788992244

Author (1)

author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar