You're reading from Machine Learning with the Elastic Stack - Second Edition

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801070034

Edition2nd Edition

Languages

Python

Tools

Elasticsearch

Concepts

Machine Learning

Authors (3):

Rich Collier

Camilla Montonen

Bahaaldine Azarmi

View More author details

Using unsupervised ML for anomaly detection

To get a more intuitive understanding of how Elastic ML's anomaly detection works using unsupervised ML, we will discuss the following:

A rigorous definition of unusual with respect to the technology
An intuitive example of learning in an unsupervised manner
A description of how the technology models, de-trends, and scores the data

Defining unusual

Anomaly detection is something almost all of us have a basic intuition about. Humans are quite good at pattern recognition, so it should be of no surprise that if I asked 100 people on the street what's unusual in the following graph, a vast majority (including non-technical people) would identify the spike in the green line:

Figure 1.1 – A line graph showing an anomaly

Similarly, let's say we ask what's unusual in the following photo:

Figure 1.2 – A photograph showing a seal among penguins

We will, again, likely get a majority that rightly claims that the seal is the unusual thing. But people may struggle to articulate in salient terms the actual heuristics that are used in coming to those conclusions.

There are two different heuristics that we could use to define the different kinds of anomalies shown in these images:

Something is unusual if its behavior has significantly deviated from an established pattern or range based upon its past history.
Something is unusual if some characteristic of that entity is significantly different from the same characteristic of the other members of a set or population.

These key definitions will be relevant to Elastic ML's anomaly detection, as they form the two main fundamental modes of operation of the anomaly detection algorithms (temporal versus population analysis, as will be explored in Chapter 3, Anomaly Detection). As we will see, the user will have control over what mode of operation is employed for a particular use case.

Learning what's normal

As we've stated, Elastic ML's anomaly detection uses unsupervised learning in that the learning occurs without anything being taught. There is no human assistance to shape the decisions of the learning; it simply does so on its own, via inspection of the data it is presented with. This is slightly analogous to the learning of a language via the process of immersion, as opposed to sitting down with books of vocabulary and rules of grammar.

To go from a completely naive state where nothing is known about a situation to one where predictions could be made with good certainty, a model of the situation needs to be constructed. How this model is created is extremely important, as the efficacy of all subsequent actions taken based upon this model will be highly dependent on the model's accuracy. The model will need to be flexible and continuously updated based upon new information, because that is all that it has to go on in this unsupervised paradigm.

Probability models

Probability distributions can serve this purpose quite well. There are many fundamental types of distributions (and Elastic ML uses a variety of distribution types, such as Poisson, Gaussian, log-normal, or even mixtures of models), but the Poisson distribution is a good one to discuss first, because it is appropriate in situations where there are discrete occurrences (the "counts") of things with respect to time:

Figure 1.3 – A graph demonstrating Poisson distributions (source: https://en.wikipedia.org/wiki/Poisson_distribution#/media/File:Poisson_pmf.svg)

There are three different variants of the distribution shown here, each with a different mean (λ) and the highest expected value of k. We can make an analogy that says that these distributions model the expected amount of postal mail that a person gets delivered to their home on a daily basis, represented by k on the x axis:

For λ = 1, there is about a 37% chance that zero pieces or one piece of mail are delivered daily. Perhaps this is appropriate for a college student that doesn't receive much postal mail.
For λ = 4, there is about a 20% chance that three or four pieces are received. This might be a good model for a young professional.
For λ = 10, there is about a 13% chance that 10 pieces are received per day—perhaps representing a larger family or a household that has somehow found themselves on many mailing lists!

The discrete points on each curve also give the likelihood (probability) of other values of k. As such, the model can be informative and answer questions such as "Is getting 15 pieces of mail likely?" As we can see, it is not likely for a student (λ = 1) or a young professional (λ = 4), but it is somewhat likely for a large family (λ = 10). Obviously, there was a simple declaration made here that the models shown were appropriate for the certain people described—but it should seem obvious that there needs to be a mechanism to learn that model for each individual situation, not just assert it. The process for learning it is intuitive.

Learning the models

Sticking with the postal mail analogy, it would be instinctive to realize that a method of determining what model is the best fit for a particular household could be ascertained simply by hanging out by the mailbox every day and recording what the postal carrier drops into the mailbox. It should also seem obvious that the more observations made, the higher your confidence should be that your model is accurate. In other words, only spending 3 days by the mailbox would provide less complete information and confidence than spending 30 days, or 300 for that matter.

Algorithmically, a similar process could be designed to self-select the appropriate model based upon observations. Careful scrutiny of the algorithm's choices of the model type itself (that is, Poisson, Gaussian, log-normal, and so on) and the specific coefficients of that model type (as in the preceding example of λ) would also need to be part of this self-selection process. To do this, constant evaluation of the appropriateness of the model is done. Bayesian techniques are also employed to assess the model's likely parameter values, given the dataset as a whole, but allowing for tempering of those decisions based upon how much information has been seen prior to a particular point in time. The ML algorithms accomplish this automatically.

Note

For those that want a deeper dive into some of the representative mathematics going on behind the scenes, please refer to the academic paper at http://www.ijmlc.org/papers/398-LC018.pdf.

Most importantly, the modeling that is done is continuous, so that new information is considered along with the old, with an exponential weighting given to information that is fresher. Such a model, after 60 observations, could resemble the following:

Figure 1.4 – Sample model after 60 observations

It will then seem very different after 400 observations, as the data presents itself with a slew of new observations with values between 5 and 10:

Figure 1.5 – Sample model after 400 observations

Also, notice that there is the potential for the model to have multiple modes or areas/clusters of higher probability. The complexity and trueness of the fit of the learned model (shown as the blue curve) with the theoretically ideal model (in black) matters greatly. The more accurate the model, the better representation of the state of normal for that dataset and thus, ultimately, the more accurate the prediction of how future values comport with this model.

The continuous nature of the modeling also drives the requirement that this model is capable of serialization to long-term storage, so that if model creation/analysis is paused, it can be reinstated and resumed at a later time. As we will see, the operationalization of this process of model creation, storage, and utilization is a complex orchestration, which is fortunately handled automatically by Elastic ML.

De-trending

Another important aspect of faithfully modeling real-world data is to account for prominent overtone trends and patterns that naturally occur. Does the data ebb and flow hourly and/or daily with more activity during business hours or business days? If so, this needs to be accounted for. Elastic ML automatically hunts for prominent trends in the data (linear growth, cyclical harmonics, and so on) and factors them out. Let's observe the following graph:

Figure 1.6 – Periodicity detection in action

Here, the periodic daily cycle is learned, then factored out. The model's prediction boundaries (represented in the light-blue envelope around the dark-blue signal) dramatically adjust after automatically detecting three successive iterations of that cycle.

Therefore, as more data is observed over time, the models gain accuracy both from the perspective of the probability distribution function getting more mature, as well as via the auto-recognizing and de-trending of other routine patterns (such as business days, weekends, and so on) that might not emerge for days or weeks. In the following example, several trends are discovered over time, including daily, weekly, and an overall linear slope:

Figure 1.7 – Multiple trends being detected

These model changes are recorded as system annotations. Annotations, as a general concept, will be covered in later chapters.

Scoring of unusualness

Once a model has been constructed, the likelihood of any future observed value can be found within the probability distribution. Earlier, we had asked the question "Is getting 15 pieces of mail likely?" This question can now be empirically answered, depending on the model, with a number between 0 (no possibility) and 1 (absolute certainty). Elastic ML will use the model to calculate this fractional value out to approximately 300 significant figures (which can be helpful when dealing with very low probabilities). Let's observe the following graph:

Figure 1.8 – Anomaly scoring

Here, the probability of the observation of the actual value of 921 is now calculated to be 1.444e-9 (or, more commonly, a mere 0.0000001444% chance). This very small value is perhaps not that intuitive to most people. As such, ML will take this probability calculation, and via the process of quantile normalization, re-cast that observation on a severity scale between 0 and 100, where 100 is the highest level of unusualness possible for that particular dataset. In the preceding case, the probability calculation of 1.444e-9 is normalized to a score of 94. This normalized score will come in handy later as a means by which to assess the severity of the anomaly for the purposes of alerting and/or triage.

The element of time

In Elastic ML, all of the anomaly detection that we will discuss throughout the rest of the book will have an intrinsic element of time associated with the data and analysis. In other words, for anomaly detection, Elastic ML expects the data to be time series data and that data will be analyzed in increments of time. This is a key point and also helps discriminate between anomaly detection and data frame analytics in addition to the unsupervised/supervised paradigm.

You will see that there's a slight nuance with respect to population analysis (covered in Chapter 3, Anomaly Detection) and outlier detection (covered in Chapter 10, Outlier Detection While they effectively both find entities that are distinctly different from their peers, population analysis in anomaly detection does so with respect to time, whereas outlier detection analysis isn't constrained by time. More will become obvious as these topics are covered in depth in later chapters.

You have been reading a chapter from

Machine Learning with the Elastic Stack - Second Edition

Published in: May 2021Publisher: PacktISBN-13: 9781801070034

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Rich Collier

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi

Other recommended products

Related to this chapter

Machine Learning with the Elastic Stack

Elastic has announced the integration of Prelert machine learning technology within its ecosystem allowing real-time generation of business insights from the Elasticsearch data without it leaving the cluster at all. This book will demonstrate these unique features and teach you to perform machine learning on the Elastic Stack without any hassle.

BookJan 2019304 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Threat Hunting with Elastic Stack

Elastic security offers enhanced threat hunting capabilities to build active defense strategies. Complete with practical examples and tips, this easy-to-follow guide will help you enhance your security skills by leveraging the Elastic Stack for security monitoring, incident response, intelligence analysis, or threat hunting.

BookJul 2021392 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages