You're reading from Machine Learning with the Elastic Stack - Second Edition

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801070034

Edition2nd Edition

Languages

Python

Tools

Elasticsearch

Concepts

Machine Learning

Authors (3):

Rich Collier

Camilla Montonen

Bahaaldine Azarmi

View More author details

Chapter 7: AIOps and Root Cause Analysis

Up until this point, we have extensively explained the value of detecting anomalies across metrics and logs separately. This is extremely valuable, of course. In some cases, however, the knowledge that a particular metric or log file has gone awry may not tell the whole story of what is going on. It may, for example, be pointing to a symptom and not the cause of the problem. To have a better understanding of the full scope of an emerging problem, it is often helpful to look holistically at many aspects of a system or situation. This involves smartly analyzing multiple kinds of related datasets together.

In this chapter, we will cover the following topics:

Demystifying the term ''AIOps''
Understanding the importance and limitations of KPIs
Moving beyond KPIs
Organizing data for better analysis
Leveraging the contextual information
Bringing it all together for RCA

Technical requirements

The information and examples demonstrated in this chapter are relevant as of v7.11 of the Elastic Stack and utilize sample datasets from the GitHub repo found at https://github.com/PacktPublishing/Machine-Learning-with-Elastic-Stack-Second-Edition.

Demystifying the term ''AIOps''

We learned in Chapter 1, Machine Learning for IT, that many companies are drowning in an ever-increasing cascade of IT data while simultaneously being asked to ''do more with less'' (fewer people, fewer costs, and so on). Some of that data is collected and/or stored in specialized tools, but some may be collected in general-purpose data platforms such as the Elastic Stack. But the question still remains: what percentage of that data is being paid attention to? By this, we mean the percentage of collected data that is actively inspected by humans or being watched by some type of automated means (defined alarms based on rules, thresholds, and so on). Even generous estimates might put the percentage in the range of single digits. So, with 90% or more data being collected going unwatched, what's being missed? The proper answer might be that we don't actually know.

Before we admonish IT organizations for...

Understanding the importance and limitations of KPIs

Because of the problem of scale and the desire to make some amount of progress in making the collected data actionable, it is natural that some of the first metrics to be tackled for active inspection are those that are the best indicators of performance or operation. The KPIs that an IT organization chooses for measurement, tracking, and flagging can span diverse indicators, including the following:

Customer experience: These metrics measure customer experience, such as application response times or error rates.
Availability: Metrics such as uptime or Mean Time to Repair (MTTR) are often important to track.
Business: Here we may have metrics that directly measure business performance, such as orders per minute or number of active users.

As such, these types of metrics are usually displayed, front and center, on most high-level operational dashboards or on staff reports for employees ranging from technicians...

Moving beyond KPIs

The process of selecting KPIs, in general, should be relatively easy, as it is likely obvious what metrics are the best indicators (if online sales are down, then the application is likely not working). But if we want to get a more holistic view of what may be contributing to an operational problem, we must expand our analysis beyond the KPIs to indicators that emanate from the underlying systems and technology that support the application.

Fortunately, there are a plethora of ways to collect all kinds of data for centralization in the Elastic Stack. The Elastic Agent, for example, is a single, unified agent that you can deploy to hosts or containers to collect data and send it to the Elastic Stack. Behind the scenes, the Elastic Agent runs the Beats shippers or Elastic Endpoint required for your configuration. Starting from version 7.11, the Elastic Agent is managed in Kibana in the Fleet user interface and can be used to add and manage integrations for popular...

Organizing data for better analysis

One of the nicest things about ingesting data via the Elastic Agent is that by default, the data collected is normalized using the Elastic Common Schema (ECS). ECS is an open source specification that defines a common taxonomy and naming conventions across data that is stored in the Elastic Stack. As such, the data becomes easier to manage, analyze, visualize, and correlate across disparate data types – including across both performance metrics and log files.

Even if you are not using the Elastic Agent or other legacy Elastic ingest tools (such as Beats and Logstash) and are instead relying on other, third-party data collection or ingest pipelines, it is still recommended that you conform your data to ECS because it will pay big dividends when users expect to use this data for queries, dashboards, and, of course, ML jobs.

Note

More information on ECS can be found in the reference section of the website at https://www.elastic.co/guide...

Leveraging the contextual information

With our data organized and/or enriched, the two primary ways we can leverage contextual information is via analysis splits and statistical influencers.

Analysis splits

We have already seen that an anomaly detection job can be split based on any categorical field. As such, we can individually model behavior separately for each instance of that field. This could be extremely valuable, especially in a case where each instance needs its own separate model.

Take, for example, the case where we have data for different regions of the world:

Figure 7.7 – Differing data behaviors based on region

Whatever data this is (sales KPIs, utilization metrics, and so on), clearly it has very distinctive patterns that are unique to each region. In this case, it makes sense to split any analysis we do with anomaly detection for each region to capitalize on this uniqueness. We would be able to detect anomalies in the behavior...

Bringing it all together for RCA

We are at the point now where we can now discuss how we can bring everything together. In our desire to increase our effectiveness in IT operations and look more holistically at application health, we now need to operationalize what we've prepared in the prior sections and configure our anomaly detection jobs accordingly. To that end, let's work through a real-life scenario in which Elastic ML helped us get to the root cause of an operational problem.

Outage background

This scenario is loosely based on a real application outage, although the data has been somewhat simplified and sanitized to obfuscate the original customer. The problem was with a retail application that processed gift card transactions. Occasionally, the app would stop working and transactions could not be processed. This would only be discovered when individual stores called headquarters to complain. The root cause of the issue was unknown and couldn't be ascertained...

Summary

Elastic ML can certainly boost the amount of data that IT organizations pay attention to, and thus get more insight and proactive value out of their data. The ability to organize, correlate, and holistically view related anomalies across data types is critical to problem isolation and root cause identification. It reduces application downtime and limits the possibility of problem recurrence.

In the next chapter, we will see how other apps within the Elastic Stack (APM, Security, and Logs) take advantage of Elastic ML to provide an out-of-the-box experience that's custom-tailored for specific use cases.

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning with the Elastic Stack - Second Edition

Published in: May 2021Publisher: PacktISBN-13: 9781801070034

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Rich Collier

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi

Other recommended products

Related to this chapter

Machine Learning with the Elastic Stack

Elastic has announced the integration of Prelert machine learning technology within its ecosystem allowing real-time generation of business insights from the Elasticsearch data without it leaving the cluster at all. This book will demonstrate these unique features and teach you to perform machine learning on the Elastic Stack without any hassle.

BookJan 2019304 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Threat Hunting with Elastic Stack

Elastic security offers enhanced threat hunting capabilities to build active defense strategies. Complete with practical examples and tips, this easy-to-follow guide will help you enhance your security skills by leveraging the Elastic Stack for security monitoring, incident response, intelligence analysis, or threat hunting.

BookJul 2021392 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages