Chapter 7: Aggregations

In developing search solutions, not only are the results important, but they also help us to improve the quality and the search focus. Elasticsearch provides a powerful tool to achieve these goals: aggregations. The main usage of aggregations is to provide additional data to the search results to improve their quality or to augment them with additional information.

For example, in a search for news articles, some facets that could be interesting to calculate could be the authors who wrote the articles and the date histogram of the publishing date; thus, aggregations are used, not only to improve the results' focus, but also to provide insight into stored data (analytics). This is the way that a lot of tools such as Kibana (https://www.elastic.co/products/kibana) are born.

Generally, aggregations are displayed to the end user with graphs or a group of filtering options (for example, a list of categories for the search results). Because the Elasticsearch...

Executing an aggregation

Elasticsearch provides several functionalities other than Search; this allows you to execute statistics and real-time analytics on searches using the aggregations.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using the Kibana console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’s used in this recipe is index-agg.

How to do it...

To execute an aggregation, we will perform the following steps:

Compute the top 10 tags by name using the command...

Executing a stats aggregation

The most used metric aggregations are stats aggregations, which are able to compute several metrics of a bucket of documents in one go.

They are generally used as a terminal aggregation step to compute values that will be used directly or for further sorting.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’s used in this recipe is index-agg.

How to do it...

To execute a stats aggregation...

Executing a terms aggregation

The most used bucket aggregation is the terms one, which groups the documents into buckets based on a single term value. This aggregation is often used to narrow down the search using the computed values as filters for the queries.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’s used in this recipe is index-agg.

How to do it...

To execute a terms aggregation, we will perform the following...

Executing a significant terms aggregation

Significant terms aggregation is an evolution of the previous one, in that it’s able to cover several scenarios, such as the following:

Suggesting relevant terms related to current query text
Discovering relations between terms
Discovering common patterns in text

In these scenarios, the result must not be as simple as the previous terms aggregations; it must be computed as a variance between a foreground set (generally the query) and a background one (a large bulk of data).

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To...

Executing a range aggregation

The previous recipe describes an aggregation type that can be very useful if buckets must be computed on fixed terms or on a limited number of items. Otherwise, it’s often required to return the buckets aggregated in ranges; the range aggregations meet this requirement. Commons scenarios are as follows:

Price range (used in shops)
Size range
Alphabetical range

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available...

Executing a histogram aggregation

Elasticsearch numerical values can be used to process histogram data.

The histogram representation is a very powerful way to show data to end users, mainly using bar charts.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’s used in this recipe is index-agg.

How to do it...

Using the items populated with the script, we will calculate the following histogram aggregations:

...

Executing a date histogram aggregation

The previous recipe used mainly numeric fields. Elasticsearch provides special functionalities to compute the date histogram aggregation, which operates on date or datetime values.

This aggregation is required because date values need more customization to solve problems, such as timezone conversion and special time intervals.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’...

Executing a filter aggregation

Sometimes, we need to reduce the number of hits in our aggregation to satisfy a particular filter. To obtain this result, filter aggregation is used.

The filter is one of the simpler ways to manipulate the bucket when filtering out values.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index that’s used in this recipe is index-agg.

How to do it...

We need to compute two different filter...

Executing a filters aggregation

The filters aggregation answers the common requirement to split bucket documents using custom filters, which can be every kind of query supported by Elasticsearch.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index used in this recipe is index-agg.

How to do it...

We need to compute a filters aggregation composed of the following queries:

Date greater than 2022/01/01 and price greater or...

Executing a global aggregation

The aggregations are generally executed on query search results; Elasticsearch provides a special aggregation, global, that is executed globally on all the documents without being influenced by the query.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index used in this recipe is index-agg.

How to do it...

To execute global aggregations, we will perform the following steps:

Compare a global...

Executing a geo distance aggregation

Among the other standard types that we have seen in the previous aggregations, Elasticsearch allows you to execute aggregations against a GeoPoint: the geo distance aggregations. This is an evolution of the previously discussed range aggregations that have been built to work on geo locations.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index used in this recipe is index-agg.

How to do it...

...

Executing a children aggregation

Children aggregation allows you to execute analytics based on parent documents and child documents. When working with complex structures, nested objects are very common.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch04/populate_kibana.sh commands available in the online code.

The index used in this recipe is mybook-join.

How to do it...

To execute children aggregations, we will perform the following steps:

Index documents with child or parent relations,...

Executing a nested aggregation

Nested aggregation allows you to execute analytics on nested documents. When working with complex structures, nested objects are very common.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch04/populate_kibana.sh commands available in the online code.

The index used in this recipe is mybooks-join.

How to do it...

To execute nested aggregations, we will perform the following steps:

Create a nested aggregation to return the minimum size of the product version that...

Executing a top hit aggregation

The top hit aggregation is different from the other aggregation types. All the previous aggregations have metric (simple) values or bucket values; the top hit aggregation returns buckets of search hits (documents).

Generally, the top hit aggregation is used as a sub-aggregation, so that the top matching documents can be aggregated in buckets. The most common scenario for this aggregation is to have, for example, the top n documents grouped by category (very common in search results in e-commerce websites).

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly...

Executing a matrix stats aggregation

Elasticsearch 5.x or above provided a special module called aggs-matrix-stats that automatically computes advanced statistics on several fields.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index used in this recipe is index-agg.

How to do it...

To execute a matrix stats aggregation, we will perform the following steps:

First, we will evaluate statistics related to price and age in...

Executing a geo bounds aggregation

It’s a very common scenario to have a set of documents that match a query, and you need to know the box that contains them; the solution to this scenario is the geo bounds metric aggregation.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands available in the online code.

The index used in this recipe is index-agg.

How to do it...

To execute geo bounds aggregations, we will perform the following steps:

Execute a query...

Executing a geo centroid aggregation

If you have a lot of geo-localized events and you need to know the center of these events, the geo centroid aggregation allows you to compute this geopoint.

Common scenarios could be as follows:

During Twitter monitoring for events (earthquakes or tsunamis, for example): to detect the center of the event by monitoring the first top n events tweets.
Having documents that have coordinates: to find the common center of these documents.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an...

Executing a geotile grid aggregation

Using Elasticsearch to show data on maps is a very common pattern between Elasticsearch users. One of the most commonly used map formats is the tile one, in which a map is split into several small square parts and when the render of a location is required, the tiles near the location are fetched by a server.

Apart from commercial solutions, OpenStreetMap (https://www.openstreetmap.org/) maps are the most used, and a lot of Kibana maps are based on their tile servers. OpenStreetMap is open source and you can easily provide your own tile server via a Docker (https://switch2osm.org/serving-tiles/using-a-docker-container/).

The geotile grid aggregation allows to return buckets of documents with geopoints or geoshapes in the standard map tile format used for cells “{zoom}/{x}/{y}”.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter...

Executing a sampler aggregation

It’s quite common to use a significant sample of documents to return the most important analytics/Key Performance Indicators (KPI) without computing the analytics on all the datasets.

This aggregation works with a query that can score the documents and, based on the score, it takes the n ranked first document to compute its sub-aggregations; from this behavior, it’s called sampler.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch07/populate_aggregation.sh commands...

Executing a pipeline aggregation

Elasticsearch allows you to define aggregations that are a mix of the results of other aggregations (for example, by comparing the results of two metric aggregations); these are pipeline aggregations.

They are very common when you need to compute results from different aggregations, such as statistics on results.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need to create index-pipagg with the following command:

PUT /index-pipagg
{ “mappings”: {
    “properties...

You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Table of Contents (20) Chapters

Chapter 7: Aggregations

Executing an aggregation

Getting ready

How to do it...

Executing a stats aggregation

Getting ready

How to do it...

Executing a terms aggregation

Getting ready

How to do it...

Executing a significant terms aggregation

Getting ready

Executing a range aggregation

Getting ready

Executing a histogram aggregation

Getting ready

How to do it...

Executing a date histogram aggregation

Getting ready

Executing a filter aggregation

Getting ready

How to do it...

Executing a filters aggregation

Getting ready

How to do it...

Executing a global aggregation

Getting ready

How to do it...

Executing a geo distance aggregation

Getting ready

How to do it...

Executing a children aggregation

Getting ready

How to do it...

Executing a nested aggregation

Getting ready

How to do it...

Executing a top hit aggregation

Getting ready

Executing a matrix stats aggregation

Getting ready

How to do it...

Executing a geo bounds aggregation

Getting ready

How to do it...

Executing a geo centroid aggregation

Getting ready

Executing a geotile grid aggregation

Getting ready

Executing a sampler aggregation

Getting ready

Executing a pipeline aggregation

Getting ready

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you