Packt+ | Advance your knowledge in tech

You're reading from Learning Elastic Stack 6.0

Product typeBook

Published inDec 2017

PublisherPackt

ISBN-139781787281868

Edition1st Edition

Tools

Kibana Elasticsearch

Concepts

Enterprise Search

Authors (2):

Pranav Shukla

Sharath Kumar M N

View More author details

Chapter 4. Analytics with Elasticsearch

On our journey of learning about Elastic Stack 6.0, we have gained a strong understanding of Elasticsearch. We have learned about the strong foundations of Elasticsearch in the previous two chapters, and gained an in-depth understanding of its search use cases.

The underlying technology Apache Lucene was originally developed for text search use cases. Due to innovations in Apache Lucene and additional innovations in Elasticsearch, it has also emerged as a very powerful analytics engine. In this chapter, we will understand how Elasticsearch can serve as your analytics engine. We will look at the following:

The basics of aggregations
Preparing data for analysis
Metric aggregations
Bucket aggregations
Pipeline aggregations

We will learn all of this by using a real-world dataset. Let us start by understanding the basics of aggregations.

The basics of aggregations

In contrast to search, analytics deals with the bigger picture. Searching addresses the need for zooming in to a few records; analytics addresses the need for zooming out and slicing the data in different ways. While learning about searching, we used the API of the following form:

POST /<index_name>/<type_name>/_search
{
  "query": 
  {
    ... type of query ...
  }
}

All aggregation queries take a common form. Let us understand the structure.

The aggregations or aggs element allows us to aggregate data. All aggregation requests take the following form:

POST /<index_name>/<type_name>/_search
{  
  "aggs": {                                 
    ... type of aggregation ...
          },
  "query": {  ... type of query ... },              //optional query part
  "size": 0                                         //size typically set to 0
}

The aggs element should contain the actual aggregation query. The body depends on the type of aggregation that...

Preparing data for analysis

We will consider an example of network traffic data generated from Wi-Fi routers. Throughout this chapter, we will analyze the data from this example. It is important to understand what the records in the underlying system look like and what they represent. We will cover the following topics while we prepare and load the data into the local Elasticsearch instance:

Understanding the structure of data
Loading the data using Logstash

Understanding the structure of data

The following diagram depicts the design of the system, to help you gain a better understanding of the problem and the structure of data collected:

Fig 4.1 Network traffic and bandwidth usage data for Wi-Fi traffic and storage in Elasticsearch

The data is collected by the system with the following objectives:

In the left half of the figure, there are multiple squares representing one customer's premises, with the Wi-Fi routers deployed on that site, along with all devices connected to those Wi-Fi routers...

Metric aggregations

Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations.

We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations.

We will learn about the following metric aggregations:

Sum, average, min, and max aggregations
Stats and extended stats aggregations
Cardinality aggregation

Let us learn about them one by one.

Sum, average, min, and max aggregations

Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with...

Bucket aggregations

Bucket aggregations are useful to analyze how the whole relates to its parts to gain better insight. They help in segmenting the data into smaller parts. Each type of bucket aggregation slices the data into different segments or buckets. Bucket aggregations are the most common type of aggregation used in any analysis process.

We will cover the following topics, keeping the network traffic data example at the center:

Bucketing on string data
Bucketing on numeric data
Aggregating filtered data
Nesting aggregations
Bucketing on custom conditions
Bucketing on date/time data
Bucketing on geo-spatial data

Bucketing on string data

Sometimes, we may need to bucket the data or segment the data based on a field that has a string datatype, typically keyword typed fields in Elasticsearch. This is very common. Some examples of scenarios in which you may want to segment the data by a string typed field are:

Segmenting the network traffic data per department
Segmenting the network traffic data...

Pipeline aggregations

Pipeline aggregations, as their name suggests, allow you to aggregate over the result of another aggregation. They let you pipe the result of an aggregation as an input to another aggregation. Pipeline aggregations are a relatively new feature and they are still experimental. At a high level, there are two types of pipeline aggregation:

Parent pipeline aggregations have the pipeline aggregation nested inside other aggregations
Sibling pipeline aggregations have the pipeline aggregation as the sibling of the original aggregation from which pipelining is done

Let us understand how the pipeline aggregations work by taking one example of cumulative sum aggregation, which is a parent of pipeline aggregation.

Calculating the cumulative sum of usage over time

While understanding the Date Histogram aggregation and in the section Focusing on a specific day and changing intervals, we looked at the aggregation, to compute hourly bandwidth usage for one particular day. After completing...

Summary

In this chapter, we have learnt how to use Elasticsearch to build powerful analytics applications. We have covered how to slice and dice the data to get powerful insight. We started with metric aggregation to deal with numeric datatypes. We then covered bucket aggregation to find out how to slice the data into buckets or segments in order to drill down into specific segments.

We also understood how pipeline aggregations work. We did all of this while dealing with a real-world-like dataset of network traffic data. We have seen how flexible Elasticsearch is as an analytics engine. Without much additional data modelling and extra effort, we can analyze any field, even when the data is on a big data scale. This is a rare capability not offered by many data stores. As you will see in Chapter 7, Visualizing Data with Kibana, Kibana leverages many of the aggregations that we learnt about in this chapter.

This concludes the chapters on Elasticsearch, the core of Elastic Stack. We have a very...

The rest of the chapter is locked

You have been reading a chapter from

Learning Elastic Stack 6.0

Published in: Dec 2017Publisher: PacktISBN-13: 9781787281868

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Pranav Shukla

Pranav Shukla is the founder and CEO of Valens DataLabs, a technologist, husband, and father of two. He is a big data architect and software craftsman who uses JVM-based languages. Pranav has diverse experience of over 14 years in architecting enterprise applications for Fortune 500 companies and start-ups. His core expertise lies in building JVM-based, scalable, reactive, and data-driven applications using Java/Scala, the Hadoop ecosystem, Apache Spark, and NoSQL databases. He is a big data engineering, analytics, and machine learning enthusiast.
Read more about Pranav Shukla

Sharath Kumar M N

Sharath Kumar M N did his master's in computer science at the University of Texas, Dallas, USA. He is currently working as a senior principal architect at Broadcom. Prior to this, he was working as an Elasticsearch solutions architect at Oracle. He has given several tech talks at conferences such as Oracle Code events. Sharath is a certified trainer Elastic Certified Instructor one of the few technology experts in the world who has been certified by Elastic Inc. to deliver their official from the creators of Elastic training. He is also a data science and machine learning enthusiast. In his free time, he likes playing with his lovely niece, Monisha; nephew, Chirayu; and his pet, Milo.
Read more about Sharath Kumar M N

Other recommended products

Related to this chapter

Learning Elastic Stack 7.0

This book teaches you about every component of the Elastic Stack - including Elasticsearch, Kibana, Logstash, and X-pack - with new and the updated features that are released with the 7.0 version. With the help of this book, you will be able to develop enterprise-grade distributed search and analytics applications for your data without any hassle.

BookMay 2019474 pages

Kibana 7 Quick Start Guide

Kibana is the visualization tool of the Elastic Stack, used for visualizing the results of the queries as well the dashboards generated out of the Elasticsearch and Logstash components. This book contains core concepts of Kibana with a straightforward form of chapters so that reader can move forward in a step by step manner.

BookJan 2019172 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Elasticsearch 7 Quick Start Guide

Elasticsearch is one of the most popular tools for distributed search. This book will help you in understanding all about the new features of Elasticsearch 7, and how to use them efficiently for searching, aggregating and indexing data with speed and accuracy.

BookOct 2019186 pages

Mastering Elastic Stack

BookFeb 2017526 pages

Learning Elasticsearch

Elasticsearch is a Lucene-based search and analytics engine for distributed search and analytics. This book will be your hands-on guide as you explore and put to use the features of Elasticsearch 5.x.

BookJun 2017404 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Mastering Elasticsearch 5.x

This book will help you leverage Elasticsearch, guiding you through everything from writing and creating customized plugins to extend Elasticsearch to tackling challenges while handling relational data in Elasticsearch. You’ll learn with the help of practical examples in a step-by-step way.

BookFeb 2017428 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Elasticsearch 5.x Cookbook

BookFeb 2017696 pages

Threat Hunting with Elastic Stack

Elastic security offers enhanced threat hunting capabilities to build active defense strategies. Complete with practical examples and tips, this easy-to-follow guide will help you enhance your security skills by leveraging the Elastic Stack for security monitoring, incident response, intelligence analysis, or threat hunting.

BookJul 2021392 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages