You're reading from Machine Learning with the Elastic Stack - Second Edition

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801070034

Edition2nd Edition

Languages

Python

Tools

Elasticsearch

Concepts

Machine Learning

Authors (3):

Rich Collier

Camilla Montonen

Bahaaldine Azarmi

View More author details

Appendix: Anomaly Detection Tips

As we wind down the content for this book, it occurred to us that there's still a plethora of good, bite-sized explanations, examples, and pieces of advice that didn't quite fit into sections of the other chapters. It therefore made sense to give them a home all to themselves here in the Appendix. Enjoy this potpourri of tips, tricks, and advice!

The following topics will be covered here in the Appendix:

Understanding influencers in split versus non-split jobs
Using one-sided functions to your advantage
Ignoring time periods
Using custom rules and filters to your advantage
Anomaly detection job throughput considerations
Avoiding the over-engineering of a use case
Using anomaly detection on runtime fields

Technical requirements

The information in this chapter will use the Elastic Stack as it exists in v7.12.

Understanding influencers in split versus non-split jobs

You might question whether or not it is necessary to split the analysis by a field, or merely hope that the use of influencers will give the desired effect of identifying the offending entity.

Let's remind ourselves of the difference between the purpose of influencers and the purpose of splitting a job. An entity is identified by an anomaly detection job as an influencer if it has contributed significantly to the existence of the anomaly. This notion of deciding influential entities is completely independent of whether or not the job is split. An entity can be deemed influential on an anomaly only if an anomaly happens in the first place. If there is no anomaly detected, there is no need to figure out whether there is an influencer. However, the job may or may not find that something is anomalous, depending on whether or not the job is split into multiple time series. When splitting the job, you are modeling (creating...

Using one-sided functions to your advantage

Many people realize the usefulness of one-sided functions in ML, such as low_count and high_mean, to allow for the detection of anomalies only on the high side or on the low side. This is useful when you only care about a drop in revenue or a spike in response time.

However, when you care about deviations in both directions, you are often inclined to use just the regular function (such as count or mean). However, on some datasets, it is more optimal to use both the high and low versions of the function as two separate detectors. Why is this the case and under what conditions, you might ask?

The condition where this makes sense is when the dynamic range of the possible deviations is asymmetrical. In other words, the magnitude of potential spikes in the data is far, far bigger than the magnitude of the potential drops, possibly because the count or sum of something cannot be less than zero. Let's look at the following screenshot...

Ignoring time periods

Often, people ask how they can get ML to ignore the fact that a certain event has occurred. Perhaps it was an expected maintenance window, or perhaps something was broken within the data ingest pipeline and data was lost for a few moments. There are a few ways that you can get ML to ignore time periods, and for distinction, we'll separate them into two groups:

A known, upcoming window of time
An unexpected window of time that is discovered only after the fact

To illustrate things, we'll use a single-metric count job (from Figure A.1) on the farequote dataset that has an anomaly on the date of February 9th:

Figure A.10 – An analysis on the farequote dataset with an anomaly we'd like to ignore

Now, let's explore the ways we can ignore the anomaly on February 9th using different situations.

Ignoring an upcoming (known) window of time

Two methods can be used to ignore an upcoming window of...

Using custom rules and filters to your advantage

While the anomaly detection jobs are incredibly useful, they are also agnostic to the domain and to the relevance of the raw data. In other words, the unsupervised machine learning algorithms do not know that a tenfold increase in CPU utilization (from 1% to 10%, for example) may not be that interesting to the proper operation of an application even though it may be statistically anomalous/unlikely in the scenario. Likewise, the anomaly detection jobs treat every entity analyzed equally, but the user might want to disavow results for a certain IP address or user ID, since the user knows that anomalies found for these entities are not desired or useful. The usage of custom rules and filters allows the user to inject domain knowledge into the anomaly detection job configuration, thereby having a fair amount of control as to what gets deemed or marked anomalous – or even if entities get considered part of the modeling process in...

Anomaly detection job throughput considerations

Elastic ML is awesome and is no doubt very fast and scalable, but there will still be a practical upper bound of events/second processed to any anomaly detection job, depending on a couple of different factors:

The speed at which data can be delivered to the algorithms (that is, query performance)
The speed at which the algorithms can chew through the data, given the desired analysis

For the latter, much of the performance is based upon the following:

The function(s) chosen for the analysis, that is, count is faster than lat_long
The bucket_span value chosen (longer bucket spans are faster than smaller bucket spans because more buckets analyzed per unit of time compound the per-bucket processing overhead, which is writing results and so on)

However, if you have a defined analysis set up and can't change it for other reasons, then there's not that much you can do unless you get creative and...

Avoiding the over-engineering of a use case

I once worked with a user where we discussed different use cases for anomaly detection. In particular, this customer was building a hosted security operations center as part of their managed security service provider (MSSP) business, so they were keen to think about use cases in which ML could help.

A high-level theme to their use cases was to look at a user's behavior and find unexpected behavior. One example that was discussed was login activity from unusual/rare locations such as Bob just logged in from Ukraine, but he doesn't normally log in from there.

In the process of thinking the implementation through, there was talk of them having multiple clients, each of which had multiple users. Therefore, they were thinking of ways to split/partition the data so that they could execute rare by country for each and every user of every client.

I asked them to take a step back and said, "Is it worthy of an anomaly if anyone...

Using anomaly detection on runtime fields

In some cases, it might be necessary to analyze the value of a field that doesn't exist in the index mappings but can be calculated dynamically from other field values. This capability to dynamically define field values has existed for quite some time in Elasticsearch as script fields, but starting in v7.11, script fields are replaced by an updated concept known as runtime fields. In short, runtime fields are treated like first-class citizens in the Elasticsearch mapping (if defined there) and will eventually allow the user to promote a runtime field into an indexed field.

Users can define runtime fields in the mapping or only in the search request. It is good to note that at the time of writing, there is no support for definitions of runtime fields in the data feed of an anomaly detection job. However, if the runtime fields are defined in the mappings, then the anomaly detection job can leverage them seamlessly.

Note

For more...

Summary

Elastic ML is a powerful, flexible, yet easy-to-use feature that gives the power of data science to non-data scientists so that they can gain insight into massive amounts of data. Throughout this entire book, there are many different ways in which users can take advantage of technology to solve real-world challenges in IT. We hope that you will take the knowledge that you have gained in this book and implement some great use cases of your own. Don't worry about solving all possible problems on day 1 – start small, get some tangible wins, and grow your usage as you gain more confidence. Success will breed success!

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning with the Elastic Stack - Second Edition

Published in: May 2021Publisher: PacktISBN-13: 9781801070034

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Rich Collier

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi

Other recommended products

Related to this chapter

Machine Learning with the Elastic Stack

Elastic has announced the integration of Prelert machine learning technology within its ecosystem allowing real-time generation of business insights from the Elasticsearch data without it leaving the cluster at all. This book will demonstrate these unique features and teach you to perform machine learning on the Elastic Stack without any hassle.

BookJan 2019304 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Threat Hunting with Elastic Stack

Elastic security offers enhanced threat hunting capabilities to build active defense strategies. Complete with practical examples and tips, this easy-to-follow guide will help you enhance your security skills by leveraging the Elastic Stack for security monitoring, incident response, intelligence analysis, or threat hunting.

BookJul 2021392 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning with the Elastic Stack - Second Edition

Appendix: Anomaly Detection Tips

Technical requirements

Understanding influencers in split versus non-split jobs

Using one-sided functions to your advantage

Ignoring time periods

Ignoring an upcoming (known) window of time

Using custom rules and filters to your advantage

Anomaly detection job throughput considerations

Avoiding the over-engineering of a use case

Using anomaly detection on runtime fields

Summary

Why subscribe?

Unlock this book and the full library FREE for 7 days

Authors (3)

Machine Learning with the Elastic Stack

Learning Kibana 7

Mastering Kibana 6.x

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

Threat Hunting with Elastic Stack

Learning Kibana 5.0

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook