You're reading from Getting Started with Elastic Stack 8.0

Product typeBook

Published inMar 2022

PublisherPackt

ISBN-139781800569492

Edition1st Edition

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Asjad Athick

Chapter 5: Running Machine Learning Jobs on Elasticsearch

In the previous chapter, we looked at how large volumes of data can be managed and leveraged for analytical insight. We looked at how changes in data can be detected and responded to using rules (also called alerts). This chapter explores the use of machine learning techniques to look for unknowns in data and understand trends that cannot be captured using a rule-based approach.

Machine learning is a dense subject with a wide range of theoretical and practical concepts to cover. In this chapter, we will focus on some of the more important aspects of running machine learning jobs on Elasticsearch. Specifically, we will cover the following:

Preparing data for machine learning
Running single- and multi-metric anomaly detection jobs on time series data
Classifying data using supervised machine learning models
Running machine learning inference on incoming data

Technical requirements

To use machine learning features, ensure that the Elasticsearch cluster contains at least one node with the role ml. This enables the running of machine learning jobs on the cluster:

If you're running with default settings on a single node, this role should already be enabled, and no further configuration is necessary.
If you're running nodes with custom roles, ensure the role is added to elasticsearch.yml, as follows:
```
node.roles: [data, ml]
```

The value of running machine learning on Elasticsearch

Elasticsearch is a powerful tool when it comes to storing, searching, and aggregating large volumes of data. Dashboards and visualizations help with user-driven interrogation and exploration of data, while tools such as Watcher and Kibana alerting allow users to take automatic action when data changes in a predefined or expected manner.

However, a lot of data sources can often represent trends or insights that are hard to capture as a predefined rule or query. Consider the following example:

A logging platform collects application logs (using an agent) from about 5,000 endpoints across an environment.
The application generates a log line for every transaction executed as soon as the transaction completes.
After a software patch, a small subset of the endpoints can intermittently and temporarily fail to write logs successfully. The machine doesn't entirely fail as the failure is intermittent in nature.

Preparing data for machine learning jobs

In order for machine learning jobs to analyze document field values when building baselines and identifying anomalies, it is important to ensure the index mappings are accurately defined. Furthermore, it is useful to parse out complex fields (using ETL tools or ingest pipelines) into their own subfields to use in machine learning jobs.

The machine learning application provides useful functionality to visualize the index you're looking to run jobs on, and ensure mappings and values are as expected. The UI lists all fields, data types, and some sample values where appropriate.

Navigate to the machine learning app on Kibana and perform the following steps:

Click on the Data Visualizer tab.
Select the webapp data view you created in the previous section.
Click on Use full webapp data to automatically update the time range filter for the full duration of your dataset.
Inspect the fields in the index and confirm all...

Looking for anomalies in time series data

Given the logs in the webapp index, there is some concern that there was some potentially undesired activity happening on the application. This could be completely benign or have malicious consequences. This section will look at how a series of machine learning jobs can be implemented to better understand and analyze the activity in the logs.

Looking for anomalous event rates in application logs

We will use a single-metric machine learning job to build a baseline for the number of log events generated by the application during normal operation.

Follow these steps to configure the job:

Open the machine learning app from the navigation menu and click on the Anomaly Detection tab.
Click on Create job and select the webapp data view. You could optionally use a saved search here with predefined filters applied to narrow down the data used for the job.
Create a single-metric job as we're only interested in the event...

Running classification on data

Unsupervised anomaly detection is useful when looking for abnormal or unexpected behavior in a dataset to guide investigation and analysis. It can unearth silent faults, unexpected usage patterns, resource abuse, or malicious user activity. This is just one class of use cases enabled by machine learning.

It is common to have historical data where, with post analysis, it is rather easy to label or tag this data with a meaningful value. For example, if you have access to service usage data for your subscription-based online application along with a record of canceled subscriptions, you could tag snapshots of the usage activity with a label indicating whether the customer churned.

Consider a different example where an IT team has access to web application logs where, with post analysis, given the request payloads are different to normal requests originating from the application, they can label events that indicate malicious activity, such as password...

Inferring against incoming data using machine learning

As we learned in Chapter 4, Leveraging Insights and Managing Data on Elasticsearch, ingest pipelines can be used to transform, process, and enrich incoming documents before indexing. Ingest pipelines provide an inference processor to run new documents through a trained machine learning model to infer classification or regression results.

Follow these instructions to create and test an ingest pipeline to run inference using the trained machine learning model:

Create a new ingest pipeline as follows. model_id will defer across Kibana instances and can be retrieved from the model pane in the Data Frame Analytics tab on Kibana. model_id in this case is classification-request-payloads-1615680927179:
```
PUT _ingest/pipeline/ml-malicious-request
{
  "processors": [
    {
      "inference": {
        "model_id...
```

Summary

In this chapter, we looked at applying supervised and unsupervised machine learning techniques on data in Elasticsearch for various use cases.

First, we explored the use of unsupervised learning to look for anomalous behavior in time series data. We used single-metric, multi-metric, and population jobs to analyze a dataset of web application logs to look for potentially malicious activity.

Next, we looked at the use of supervised learning to train a machine learning model for classifying to classify requests to the web application as malicious using features in the request (primarily the HTTP request/response size values).

Finally, we looked at how the inference processor in ingest pipelines can be used to run continuous inference using a trained model for new data.

In the next chapter, we will move our focus to Beats and their role in the data pipeline. We will look at how different types of events can be collected by Beats agents and sent to Elasticsearch or Logstash...

The rest of the chapter is locked

You have been reading a chapter from

Getting Started with Elastic Stack 8.0

Published in: Mar 2022Publisher: PacktISBN-13: 9781800569492

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Asjad Athick

Asjad Athick is a security specialist at Elastic with demonstratable experience in architecting enterprise-scale solutions on the cloud. He believes in empowering people with the right tools to help them achieve their goals. At Elastic, he works with a broad range of customers across Australia and New Zealand to help them understand their environment; this allows them to build robust threat detection, prevention, and response capabilities. He previously worked in the telecommunications space to build a security capability to help analysts identify and contextualize unknown cyber threats. With a background in application development and technology consulting, he has worked with various small businesses and start-up organizations across Australia.
Read more about Asjad Athick

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages