Reader small image

You're reading from  Elasticsearch 7 Quick Start Guide

Product typeBook
Published inOct 2019
PublisherPackt
ISBN-139781789803327
Edition1st Edition
Right arrow
Authors (2):
Anurag Srivastava
Anurag Srivastava
author image
Anurag Srivastava

Anurag Srivastava is a senior technical lead in a multinational software company. He has more than 12 years' experience in web-based application development. He is proficient in designing architecture for scalable and highly available applications. He has handled development teams and multiple clients from all over the globe over the past 10 years of his professional career. He has significant experience with the Elastic Stack (Elasticsearch, Logstash, and Kibana) for creating dashboards using system metrics data, log data, application data, and relational databases. He has authored three other booksMastering Kibana 6.x, and Kibana 7 Quick Start Guide, and Learning Kibana 7 - Second Edition, all published by Packt.
Read more about Anurag Srivastava

Douglas Miller
Douglas Miller
author image
Douglas Miller

Douglas Miller is an expert in helping fast-growing companies to improve performance and stability, and in building search platforms using Elasticsearch. Clients (including Walgreens, Nike, Boeing, and Dish Networks) have seen sales increase, fast performance times, and lower overall costs in terms of the total costs of ownership for their Elasticsearch clusters.
Read more about Douglas Miller

View More author details
Right arrow

Aggregating Datasets

Data aggregation provides us with a way to extract the information from a huge set of data and present it in a summary form. We can group the information in various buckets to get an idea of various categories or ranges. Let's take an example of a shopping site where we have complete data regarding products and their prices. Now, if we want to categorize the products into different price ranges, then we have to apply data aggregation. In the same way, we can also apply aggregation of product categories. In this chapter, we will cover the different types of aggregations that Elasticsearch provides, such as metrics, bucket, pipeline, and matrix aggregation. In this chapter, we are going to cover the following topics:

  • What is an aggregation framework?
  • Advantages of aggregations
  • Structure of aggregations
  • Metrics aggregations
  • Bucket aggregations
  • Pipeline...

What is an aggregation framework?

An aggregation framework collects analytic data from a set of documents and combines the information to build complex data summaries and statistics. There are four families of aggregations, each of which has a different role:

  • Metrics: This family of aggregations is based on the metrics on different fields of the Elasticsearch documents.
  • Bucketing: This is a family of aggregations that build buckets. Each individual bucket is correlated to a key and a document criterion. When executing an aggregation, the bucket criteria are evaluated on all documents. A document falls in a relevant bucket if it meets the criteria. Each aggregation process will result in a list of buckets that contain documents that belong to it.
  • Pipeline: The pipeline family aggregates the output of other aggregations and their associated metrics.
  • Matrix: A matrix is created...

Metrics aggregations

The metrics aggregation family computes metrics based on values collected from the aggregated documents. Numeric metrics aggregations are a special kind of metrics aggregations that output numeric values. single-value numeric aggregations are aggregations that output a single numeric metric, and multi-value numeric metrics aggregations return multiple metrics. This is an important distinction when the aggregations are used as a sub-aggregation within a bucket aggregation. The following are some of the main metrics aggregation types:

  • Avg aggregation
  • Weighted avg aggregation
  • Cardinality aggregation
  • Extended stats aggregation
  • Max aggregation
  • Min aggregation
  • Scripted metric aggregation
  • Stats aggregation
  • Sum aggregation

Now, let's cover these aggregation types in detail.

...

Bucket aggregations

Bucket aggregations create buckets of documents. Each bucket has a criterion that determines which types of document fall into it. As opposed to metrics, they can hold sub-aggregations, known as child buckets, within a parent bucket. Let's cover some of the bucket aggregation types.

Adjacency matrix aggregation

This bucket aggregation returns a form of adjacency matrix. Each bucket represents a cell in the matrix of intersecting filters. For example, given three filters, A, B and C, the response will return the following:

A

B

C

A

A

A&B

A&C

B

B

B&C

C

C

The buckets B&A, C&A, and C&B are not included as they are already in the table (as...

Pipeline aggregations

Pipeline aggregations return outputs from other aggregations, and add information to the output tree. There are two families of aggregation:

  • Parent aggregations: Pipeline aggregations that take the output from the parent aggregation and compute new buckets or aggregations to add to the existing buckets
  • Sibling aggregations: Pipeline aggregations that take the output from a sibling aggregation and compute new buckets or aggregations to add to the existing buckets

The pipeline aggregations use buckets_path to reference the aggregations. This allows the pipelines to be chained. The syntax for the path is as follows:

PATH = <AGG_NAME>[<AGG_SEPARATOR>,<AGG_NAME>]*[<METRIC_SEPARATOR>, <METRIC>];

Here, the parameters are as follows:

  • AGG_NAME: Represents the name of the aggregation
  • AGG_SEPARATOR: Represents the separator of the aggregation...

Matrix aggregations

Matrix aggregations produce a matrix result from multiple fields. They do not support scripting. Let's cover the different types of matrix aggregations.

Matrix stats

The aggregation computes the following statistics:

  • count: The number of samples per field
  • mean: The average value for each field
  • variance: The deviation of the samples from the mean per field
  • skewness: The measurement quantifying the asymmetric distribution per field
  • kurtosis: The measurement quantifying the shape of distribution per field
  • covariance: The matrix that describes how changes in fields are associated with one another
  • correlation: The covariance matrix from -1 to 1; describes the relationship between field distributions

It...

Summary

In this chapter, we have learned about Elasticsearch aggregation, by means of which we can aggregate the data to get insights. We have covered the four main types of aggregations, which are metrics, bucketing, pipeline, and matrix aggregation. We have also covered different types of aggregations within these four types.

In the next chapter, we will cover the best practices we can follow in order to manage the Elasticsearch cluster.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 7 Quick Start Guide
Published in: Oct 2019Publisher: PacktISBN-13: 9781789803327
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Anurag Srivastava

Anurag Srivastava is a senior technical lead in a multinational software company. He has more than 12 years' experience in web-based application development. He is proficient in designing architecture for scalable and highly available applications. He has handled development teams and multiple clients from all over the globe over the past 10 years of his professional career. He has significant experience with the Elastic Stack (Elasticsearch, Logstash, and Kibana) for creating dashboards using system metrics data, log data, application data, and relational databases. He has authored three other booksMastering Kibana 6.x, and Kibana 7 Quick Start Guide, and Learning Kibana 7 - Second Edition, all published by Packt.
Read more about Anurag Srivastava

author image
Douglas Miller

Douglas Miller is an expert in helping fast-growing companies to improve performance and stability, and in building search platforms using Elasticsearch. Clients (including Walgreens, Nike, Boeing, and Dish Networks) have seen sales increase, fast performance times, and lower overall costs in terms of the total costs of ownership for their Elasticsearch clusters.
Read more about Douglas Miller