Reader small image

You're reading from  Machine Learning with the Elastic Stack - Second Edition

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781801070034
Edition2nd Edition
Languages
Right arrow
Authors (3):
Rich Collier
Rich Collier
author image
Rich Collier

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

Camilla Montonen
Camilla Montonen
author image
Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

Bahaaldine Azarmi
Bahaaldine Azarmi
author image
Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi

View More author details
Right arrow

Chapter 12: Regression

In the previous chapter, we studied classification – one of the two supervised learning techniques available in the Elastic Stack. However, not all real-world applications of supervised learning lend themselves to the format required for classification. What if, for example, we wanted to predict the sales prices of apartments in our neighborhood? Or the amount of money a customer will spend in our online store? Notice that the value we are interested in here is not a discrete class, but instead is a value that can take a variety of continuous values in a range.

This is exactly the problem solved by regression analysis. Instead of predicting which class a given datapoint belongs to, we can predict a continuous value. Although the end goal is slightly different than that in classification, the underlying algorithm that is used for regression is the same as the one we examined for classification in the previous chapter. Thus, we already know a lot about...

Technical requirements

The material in this chapter will require an Elasticsearch cluster running version 7.10.1 or later. Some examples may include screenshots or guidance about details that are only available in later versions of Elasticsearch. In such cases, the text will explicitly mention which later version is required to run the example.

Using regression analysis to predict house prices

In the previous chapter, we examined the first of the two supervised learning methods in the Elastic Stack – classification. The goal of classification analysis is to use a labeled dataset to train a model that can predict a class label for a previously unseen datapoint. For example, we could train a model on historical measurements of cell samples coupled with information about whether or not the cell was malignant and use this to predict the malignancy of previously unseen cells. In classification, the class or dependent variable that we are interested in predicting is always a discrete quantity. In regression, on the other hand, we are interested in predicting a continuous variable.

Before we examine the theoretical underpinnings of regression a bit closer, let's dive right in and do a practical walk-through of how to train a regression model in Elasticsearch. The dataset we will be using is available on Kaggle (https...

Using decision trees for regression

As we have discussed in the preceding chapters, regression is a supervised learning technique. As discussed in Chapter 11, Classification Analysis, the goal of supervised learning is to take a labeled dataset (for example, a dataset that has features of houses and their sales price – the dependent variable) and distill the knowledge in this data into an artifact known as a trained model. This trained model can then be used to predict the sales prices of houses that the model has not previously seen. When the dependent variable that we are trying to predict is a continuous variable, as opposed to a discrete variable, which is the domain of classification, we are dealing with regression.

Regression – the task of distilling the information presented in real-world observations or data – is a field of machine learning that encompasses techniques far broader than the decision tree technique that is used in Elasticsearch's...

Summary

Regression is the second of the two supervised learning methods in the Elastic Stack. The goal of regression is to take a trained dataset (a dataset that contains some features and a dependent variable that we want to predict) and distill it into a trained model. In regression, the dependent variable is a continuous value, which makes it distinct from classification, which handles discrete values. In this chapter, we have made use of the Elastic Stack's machine learning functionality to use regression to predict the sales price of a house based on a number of attributes, such as the house's location and the number of bedrooms. While there are numerous regression techniques available, the Elastic Stack uses gradient boosted decision trees to train a model.

In the next chapter, we will take a look at how supervised learning models can be used together with inference processors and ingest pipelines to create powerful, machine learning-powered data analysis pipelines...

Further reading

For further information on how feature importance values are computed, please see the blogpost Feature importance for data frame analytics with Elastic machine learning here: https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning.

If you are looking for a more mathematical introduction to regression, please consult the book Mathematics for Machine Learning, available here https://mml-book.github.io/.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with the Elastic Stack - Second Edition
Published in: May 2021Publisher: PacktISBN-13: 9781801070034
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Rich Collier

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
Read more about Rich Collier

author image
Camilla Montonen

Camilla Montonen is a Senior Machine Learning Engineer at Elastic.
Read more about Camilla Montonen

author image
Bahaaldine Azarmi

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.
Read more about Bahaaldine Azarmi