Chapter 5. Data Looks Better on Maps: Master Geo-Spatiality
The world is getting smarter day by day and searches based on locations have become an integral part of our daily life. Be it searching for shopping centers, hospitals, restaurants, or any locations, we always look out for information such as distance and other information about the area. Elasticsearch is helpful in combining geo-location data with full-text search, structured search, and also in doing analytics.
In this chapter, we will cover the following topics:
Introducing geo-spatial data
Geo-spatial data is information of any object on the earth and is presented by numeric values called latitude-longitude (lat-lon) that are presented on geographical systems. Apart from lat-lon, a geo-spatial object also contains other information about that object such as name, size, and shape. Elasticsearch is very helpful when working with such kinds of data. It doesn't only provide powerful geo-location searches, but also has functionalities such as sorting with geo distance, creating geo clusters, scoring based on location, and working with arbitrary geo-shapes.
Elasticsearch has two data types to solely work on geo-spatial data; they are as follows:
geo_point: This is a combination of latitude-longitude pairs that defines a single location point
geo_shape: This works on latitude-longitudes, but with complex shapes such as points, multi-points, lines, circles, polygons, and multi-polygons defined by a geo-JSON data structure
Working with geo-point data
Geo-points are single location points defined by a latitude-longitude pair on the surface of the earth. Using geo-points you can do the following things:
Calculate the distance between two points
Find the document that falls in a specified rectangular area
Sort documents based on distance and score results based on it
Create clusters of geo-points using aggregations
Unlike all the data types in Elasticsearch, geo-point fields can't be determined dynamically. So, you have to define the mapping in advance before indexing data. The mapping for a geo-point field can be defined in the following format:
A geo_point
mapping indexes a single field (the location in our example) in the lat-lon
format. You can optionally index .lat
and .lon
separately by setting the lat-lon
parameter to true
.
Elasticsearch supports the following three formats to index geo_point
data with the same mapping that...
Sometimes searches may return too many results but you might be just interested in finding out how many documents exist in a particular range of a location. A simple example can be to see how many news events related to crime occurred in an area by plotting them on a map or by generating a heatmap cluster of the events on the map, as shown in the following image:
Elasticsearch offers both metric and bucket aggregations for geo_point fields.
Geo distance aggregation is an extension of range aggregation. It allows you to create buckets of documents based on specified ranges. Let's see how this can be done using an example.
Python example
Geo-shapes are completely different from geo-points. Until now we have worked with simple geo-location and rectangle searches. However, with geo-shapes, the sky is the limit. On a map, you can simply draw a line, polygon, or circle and ask Elasticsearch to populate the data according to the co-ordinates of your queries, as seen in the following image:
Let's see some of the most important geo-shapes.
A point is a single geographical coordinate, such as your current location shown by your smart-phone. A point in Elasticsearch is represented as follows:
A linestring
can be defined in two ways. If it contains two coordinates, it will be a straight line, but if it contains more than two points, it will be an arbitrary path:
In this chapter, we learned about geo data concepts and covered the rich geo search functionalities offered by Elasticsearch, including creating mappings for geo-points and geo-shapes, indexing documents, geo-aggregations, and sorting data based on geo-distance. We also covered code examples for the most widely used geo-queries in both Python and Java.
In the next chapter, you will learn how document relationships can be managed in Elasticsearch using nested and parent-child relationships.