Reader small image

You're reading from  Apache Superset Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
Publisher
ISBN-139781788992244
Edition1st Edition
Languages
Right arrow
Author (1)
Shashank Shekhar
Shashank Shekhar
author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Right arrow

Data

In this chapter, we will make use of two datasets. First, we will download the global list of airports as of 2017. We will download it from the OpenFlights website. They have multiple datasets—airports, airlines, routes, and aeroplanes in use. These are available on their data page at: https://openflights.org/data.html.

This is how it should appear: https://github.com/PacktPublishing/Superset-Quick-Start-Guide/blob/master/Graphics/Chapter%207/Chart%201.png

We will fetch the airports dataset from the following GitHub link: https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat. Some changes have to be made before uploading it. In the Chapter07 directory for the GitHub repository, you will find the generate_dataset.ipynb Jupyter Notebook. Run it to create the two datasets.

This is the code for creating the airports_modified.csv file:

import...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018Publisher: ISBN-13: 9781788992244

Author (1)

author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar