You're reading from Machine Learning with the Elastic Stack - Second Edition

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801070034

Edition2nd Edition

Languages

Python

Tools

Elasticsearch

Concepts

Machine Learning

Authors (3):

Rich Collier

Camilla Montonen

Bahaaldine Azarmi

View More author details

Chapter 11: Classification Analysis

When we speak about the field of machine learning and specifically the types of machine learning algorithms, we tend to invoke a taxonomy of three different classes of algorithms: supervised learning, unsupervised learning, and reinforcement learning. The third one falls outside of the scope of both this book and the current features available in the Elastic Stack, while the second one has been our topic of investigation throughout the chapters on anomaly detection, as well as the previous chapter on outlier detection. In this chapter, we will finally start dipping our toes into the world of supervised learning. The Elastic Stack provides two flavors of supervised learning: classification and regression. This chapter will be dedicated to understanding the former, while the subsequent chapter will tackle the latter.

The goal of supervised learning is to take a labeled dataset and extract the patterns from it, encode the knowledge obtained from...

Technical requirements

The material in this chapter requires Elasticsearch 7.9+. The examples have been tested using Elasticsearch version 7.10.1, but should work on any version of Elasticsearch later than 7.9. Please note that running the examples in this chapter requires a Platinum license. In case a particular example or section requires a later version of Elasticsearch, this will be mentioned in the text.

Classification: from data to a trained model

The process of training a classification model from a source dataset is a multi-step affair that involves many steps. In this section, we will take a bird's eye view (depicted in Figure 11.1) of this whole process, which begins with a labeled training dataset (Figure 11.1 part A.).

Figure 11.1 – An overview of the supervised learning process that takes a labeled dataset and outputs a trained model

This training dataset is usually split into a training part, which will be fed into the training algorithm (Figure 11.1 part B.). The output of the training algorithm is a trained model (Figure 11.1 part C.). The trained model is then used to classify the testing dataset (Figure 11.1, part D.), originally set aside from the whole dataset. The performance of the model on the testing dataset is captured in a set of evaluation metrics that can be used to determine whether a model generalizes well enough to previously...

Taking your first steps with classification

In this section, we will be creating a sample classification job using the public Wisconsin Breast Cancer dataset. The original dataset is available here: (https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)). For this exercise, we will be using a slightly sanitized version of the dataset, which will remove the necessity for data cleaning (an important step in the lifecycle of a machine learning project, but not one we have space to discuss in this book) and allow us to focus on the basics of creating a classification job:

Download the sanitized dataset file breast-cancer-wisconsin-outlier.csv from the Chapter 11 - Classification Analysis folder in the book's GitHub repository (https://github.com/PacktPublishing/Machine-Learning-with-Elastic-Stack-Second-Edition/tree/main/Chapter%2011%20-%20Classification%20Analysis) and store it locally on your machine. In your Kibana instance, navigate to the Machine...

Classification under the hood: gradient boosted decision trees

The ultimate goal for a classification task is to solve a problem that requires us to take previously unseen data points and try to infer which of the several possible classes they belong to. We achieve this by taking a labeled training dataset that contains a representative number of data points, extracting relevant features that allow us to learn a decision boundary, and then encode the knowledge about this decision boundary into a classification model. This model then makes decisions about which class a given data point belongs to. How does the model learn to do this? This is the question that we will try to answer in this section.

In accordance with our habits throughout the book, let's start by exploring conceptually what tools humans use to navigate a set of complicated decisions. A familiar tool that many of us have used before to help make decisions when several, possibly complex factors are involved, is...

Hyperparameters

In the previous section, we took a conceptual overview of how decision trees are constructed. In particular, we established that one of the criteria for determining where a decision tree should be split (in other words, when a new path should be added to our conceptual flowchart) is by looking at the purity of the resulting nodes. We also noted that allowing the algorithm to exclusively focus on the purity of the nodes as a criterion for constructing the decision tree would quickly lead to trees that overfit the training data. These decision trees are so tuned to the training data that they are not only capturing the most salient features for classifying a given data point but are even modeling the noise in the data as though it is a real signal. Therefore, while this kind of a decision tree that is allowed to optimize for specific metrics without restrictions will perform really well on the training data, it will neither perform well on the testing dataset nor generalize...

Interpreting results

In the last section, we took a look at the theoretical underpinnings of decision trees and took a conceptual tour of how they are constructed. In this section, we will return to the classification example we examined earlier in the chapter and take a closer look at the format of the results as well as how to interpret them.

Earlier in the chapter, we created a trained model to predict whether a given breast tissue sample was malicious or benign (as a reminder, in this dataset malignant is denoted by class 2 and benign by class 4). A snippet of the classification results for this model is shown in Figure 11.18.

Figure 11.17 – Classification results for a sample data point in the Wisconsin breast cancer dataset

With this trained model, we can take previously unseen data points and make predictions. What form do these predictions take? In the simplest form, a data point is assigned a class label (the field ml.Class_prediction in...

Summary

In this chapter, we have taken a deep dive into supervised learning. We have examined what supervised learning means, what role is played by training data in constructing the model, what it means to train a supervised learning model, what features are and how they should be engineered to obtain optimal performance, as well as how a model is evaluated and what various model performance measures mean.

After learning about the basics of supervised learning in general, we took a closer look at classification and examined how one can create and run classification jobs in the Elastic Stack as well as how one can evaluate the trained models that are produced by these jobs. In addition to looking at basic concepts such as confusion matrices, we also examined situations where it is good to be skeptical about results that seem to be too good to be true and the potential underlying reasons why classification results can sometimes appear perfect and why this does not necessarily mean...

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi

Other recommended products

Related to this chapter

Machine Learning with the Elastic Stack

Elastic has announced the integration of Prelert machine learning technology within its ecosystem allowing real-time generation of business insights from the Elasticsearch data without it leaving the cluster at all. This book will demonstrate these unique features and teach you to perform machine learning on the Elastic Stack without any hassle.

BookJan 2019304 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Threat Hunting with Elastic Stack

Elastic security offers enhanced threat hunting capabilities to build active defense strategies. Complete with practical examples and tips, this easy-to-follow guide will help you enhance your security skills by leveraging the Elastic Stack for security monitoring, incident response, intelligence analysis, or threat hunting.

BookJul 2021392 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning with the Elastic Stack - Second Edition

Chapter 11: Classification Analysis

Technical requirements

Classification: from data to a trained model

Taking your first steps with classification

Classification under the hood: gradient boosted decision trees

Hyperparameters

Interpreting results

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Authors (3)

Machine Learning with the Elastic Stack

Learning Kibana 7

Mastering Kibana 6.x

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

Threat Hunting with Elastic Stack

Learning Kibana 5.0

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook