Reader small image

You're reading from  Apache Superset Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
Publisher
ISBN-139781788992244
Edition1st Edition
Languages
Right arrow
Author (1)
Shashank Shekhar
Shashank Shekhar
author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar

Right arrow

Mapping Data That Has Location Information

Location information in datasets represents something we can relate to. It is about points existing in our world. This makes it one of the most interesting types of dataset for analysis. But it is not intuitive to view location coordinates without geographical maps. This makes the task of data analysis and the summarizing of location coordinates without a geographical map a bit of a challenge. However, services such as Mapbox and deck.gl provide a variety of apps and APIs for visualizing location information on beautiful maps.

In this chapter, we will render location data as scatter plots on maps as follows:

  • A scatter point
  • A scatter grid

Then, we will plot arcs and lines on a map:

  • Arcs
  • Path

Remember, the MAPBOX_API_KEY variable in the superset_config.py file that we wrote in the Superset configuration chapter? That API key must be...

Data

In this chapter, we will make use of two datasets. First, we will download the global list of airports as of 2017. We will download it from the OpenFlights website. They have multiple datasets—airports, airlines, routes, and aeroplanes in use. These are available on their data page at: https://openflights.org/data.html.

This is how it should appear: https://github.com/PacktPublishing/Superset-Quick-Start-Guide/blob/master/Graphics/Chapter%207/Chart%201.png

We will fetch the airports dataset from the following GitHub link: https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat. Some changes have to be made before uploading it. In the Chapter07 directory for the GitHub repository, you will find the generate_dataset.ipynb Jupyter Notebook. Run it to create the two datasets.

This is the code for creating the airports_modified.csv file:

import...

Scatter point

After uploading the airports_modified.csv file from the GitHub directory, open the table and select the Deck.gl - Scatter plot chart. In the Query section, select Longitude | Latitude as the coordinates. This dataset contains the locations of all airports across the globe. We will be plotting a point for each airport on the world map.

In the Point Size section, set 1000 as the Point Size so that each airport location is visible. Using Dark as the Map Style and discernible colors for showing country-wise color scheme, we will make the chart easy to understand:

Setting the parameters for plotting a point to each airport on the world map

This is how the scatter points will appear: https://github.com/PacktPublishing/Superset-Quick-Start-Guide/blob/master/Graphics/Chapter%207/Chart%202.png

In 2017, we can see a world with regions that have significantly different densities...

Scatter grid

We will use the same airports_2017 table to see where South-East Asia's largest airports are located:

Using the Filters option to select airports in South-East Asian countries

After you select Visualization Type, use the Filters option present in the Query section for selecting airports in South-East Asian countries as follows:

Setting the parameters for plotting South-East Asia's largest airports

You can change the Viewport of the Map to view it at an angle. Grid boxes representing airports in the Sichuan and Gansu provinces in China, and Jammu and Kashmir in India, have the densest color, implying they are at higher altitudes compared to other regions with airports.

This is how it will appear:

Overview of South-East Asia's largest airports (the other place names are not important)

Arcs

After taking a close look at airports in South-East Asia, let's take a view of global flight routes. We saw that there are many airports almost everywhere on the world map. There will be exponentially more flight routes. So, to get a view of flight routes that we can comprehend and summarize while looking at one chart, we filter for flight routes that start from the city of Tehran, in Iran:

Setting the parameters for flight routes across Asia

You can select Fixed Color and Stroke Width to change the aesthetics of the Arc map. We can see that Iran in 2017 has only one international flight route, to Saudi Arabia:

The flight routes in Asia (the other place names are not important)

We will now use the Map Style feature of the Arc chart to use satellite images. This time, we select one of the largest cities in the world, and one that is mentioned in the chronicles—...

Path

Routes are often found in location datasets to represent road or rail networks. We can plot those types of datasets using the Path visualization option. After uploading the rail routes dataset, either by downloading from the GitHub directory or creating it using the Jupyter Notebook, open it for visualization purposes:

CSV to Database configuration

If you take a look at the Jupyter Notebook, you will find that we create a feature named Polyline. It uses geometry information given in the file. Polylines are an encoded representation of latitudes and longitudes, as follows:

Encoded representation of latitudes and longitudes

The dataset represents rail routes over which crude oil is transported. It might be helpful to locate it on map tiles made of satellite imagery, because we can then see terrain information pertaining to the rail routes.

The bearing value in the Viewport...

Summary

In this chapter, we made charts using latitudes, longitudes, and attributes such as altitude features. It helped us to visualize two flight networks, and routes on geographical maps. With technologies such as GPS and satellite imagery, more location data is being generated. We now know how to visualize and analyze such datasets on Superset.

In the next chapter, we will make some beautiful dashboards and complete our Superset quick-start journey.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Superset Quick Start Guide
Published in: Dec 2018Publisher: ISBN-13: 9781788992244
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Shashank Shekhar

Shashank Shekhar is a data analyst and open source enthusiast. He has contributed to Superset and pymc3 (the Python Bayesian machine learning library), and maintains several public repositories on machine learning and data analysis projects of his own on GitHub. He heads up the data science team at HyperTrack, where he designs and implements machine learning algorithms to obtain insights from movement data. Previously, he worked at Amino on claims data. He has worked as a data scientist in Silicon Valley for 5 years. His background is in systems engineering and optimization theory, and he carries that perspective when thinking about data science, biology, culture, and history.
Read more about Shashank Shekhar