Reader small image

You're reading from  Getting Started with Elastic Stack 8.0

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781800569492
Edition1st Edition
Right arrow
Author (1)
Asjad Athick
Asjad Athick
author image
Asjad Athick

Asjad Athick is a security specialist at Elastic with demonstratable experience in architecting enterprise-scale solutions on the cloud. He believes in empowering people with the right tools to help them achieve their goals. At Elastic, he works with a broad range of customers across Australia and New Zealand to help them understand their environment; this allows them to build robust threat detection, prevention, and response capabilities. He previously worked in the telecommunications space to build a security capability to help analysts identify and contextualize unknown cyber threats. With a background in application development and technology consulting, he has worked with various small businesses and start-up organizations across Australia.
Read more about Asjad Athick

Right arrow

Chapter 4: Leveraging Insights and Managing Data on Elasticsearch

In the previous chapter, we looked at getting data into an Elasticsearch cluster and running searches to return relevant results for our application. This chapter will focus on how this data can be leveraged to gain analytical insights. We will also look at some important features that help with manipulating, transforming, and managing data sources when building your use cases.

Specifically, we will focus on the following topics:

  • Aggregating data for analytical insights
  • Managing the life cycle of time series data
  • Manipulating data using ingest pipelines
  • Responding to changes in data with Watcher

Technical requirements

The code and the relevant artifacts for this chapter can be found in the Chapter 04 folder, in the GitHub repository for this book. This chapter builds on the work we did in Chapter 3, Indexing and Searching for Data.

You can find the code files related to this chapter in the GitHub repository for this book:

https://github.com/PacktPublishing/Getting-Started-with-Elastic-Stack-8.0/tree/main/Chapter4.

Getting insights from data using aggregations

When looking to understand insights in your data, retrieving documents that fit the question you're looking to answer is just the first part of the problem. For example, if an analyst is looking to understand how much traffic their web servers served in a given day, running a query to retrieve logs in the given period may still return millions of events.

Aggregations allow you to summarize large volumes of data into something easier to consume. Elasticsearch can perform two primary types of aggregations:

  • Metric aggregations can calculate metrics such as count, sum, min, max, and average on numeric data.
  • Bucket aggregations can be used to organize large datasets into groups, depending on the value of a field. Buckets can be created based on a range, date, the frequency of a term in the search results (or corpus), and so on.

An exhaustive list of all supported aggregations can be found in the Elasticsearch guide...

Managing the life cycle of time series data

Most machine data sources can be characterized as time series data. Logs and metrics generally include a timestamp for recording when the event occurred or was observed. This type of data is generally not updated after it is ingested. Information changes are generally recorded as new events.

The following documents illustrate the append-only nature of time series data:

[
      {
          "sensor_name" : "living_room",
          "lights_on" : 1,
          "timestamp" : "2021-02-14T00:00:00.000Z"
      },
      {
          "sensor_name" : "living_room",
     ...

Manipulating incoming data with ingest pipelines

Elasticsearch is a "schema on write" data store. Once a document has been indexed into Elasticsearch, the field names and values that have been indexed cannot be changed unless the document is reindexed. Therefore, documents must be parsed, transformed, and cleansed before ingestion.

Runtime fields can be used to compute or evaluate the value of a field at query time. Runtime fields can be used to manipulate and transform field values when searching for data, but they can be costly and time-consuming to run across large volumes of search requests. The intended use of runtime fields is to apply temporary or one-off changes to data, rather than on every search request.

Ingest pipelines on Elasticsearch offer lightweight and convenient data transformation and manipulation functionality for when an ETL tool such as Logstash is not used. As ingest pipelines run on Elasticsearch nodes, they can scale easily as part of the...

Responding to changing data with Watcher

From the previous sections, we know how to search for data, aggregate it for analytics, and how to transform documents so that they comply with the desired schema. These capabilities power user-driven data exploration and visualization (using frontend tools such as Kibana). The same capabilities can also be used to provide automated alerting and response actions for your incoming data.

Watcher is a flexible tool that can be used to solve various alerting use cases. The following list describes some of the common alerting use cases:

  • Alert on a singular event with a particular value:

a. Alert when event.severity: critical

b. Alert when disk_free < 1GB

  • Alert if event count matching a filter exceeds a threshold:

a. Alert if 10 or more events with event.severity: critical have occurred in the last 5 mins.

b. Alert if 5 or more login_failed events per username have occurred in the last 5 mins.

  • Alert...

Summary

In this chapter, we understood how data in Elasticsearch can be aggregated for statistical insights. We explored how metric and bucket aggregations help slice and dice a large dataset to analyze data for insights.

We also looked at how ingest pipelines can be used to manipulate and transform incoming data to prepare it for use cases on Elasticsearch. We explored a range of common use cases for ingest pipelines in this section.

Lastly, we looked at how Watcher can be used to implement alerting and response actions to changes in data. Again, we explored a range of common alerting use cases in this section.

In the next chapter, we will dive into getting started with and using machine learning jobs to find anomalies in our data, run inference for new documents using the inference ingest processor, and run transformation jobs to pivot incoming datasets for machine learning.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Getting Started with Elastic Stack 8.0
Published in: Mar 2022Publisher: PacktISBN-13: 9781800569492
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Asjad Athick

Asjad Athick is a security specialist at Elastic with demonstratable experience in architecting enterprise-scale solutions on the cloud. He believes in empowering people with the right tools to help them achieve their goals. At Elastic, he works with a broad range of customers across Australia and New Zealand to help them understand their environment; this allows them to build robust threat detection, prevention, and response capabilities. He previously worked in the telecommunications space to build a security capability to help analysts identify and contextualize unknown cyber threats. With a background in application development and technology consulting, he has worked with various small businesses and start-up organizations across Australia.
Read more about Asjad Athick