Reader small image

You're reading from  Advanced Elasticsearch 7.0

Product typeBook
Published inAug 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789957754
Edition1st Edition
Languages
Right arrow
Author (1)
Wai Tak Wong
Wai Tak Wong
author image
Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong

Right arrow

Machine Learning with Elasticsearch

In the last chapter, we learned about the Elasticsearch analysis plugins. We tried three different plugins; two from the official Elastic Team and one from the community. It seems to me that the two core Analysis plugins, ICU Analysis and Smart Chinese analysis, do not meet with the expectations. At least in the simple testing, both plugins do not produce good resulting tokens. In contrast to the two officially supported Analysis plugins, the Lucene IK Analysis plugin from the community works better. In this chapter, we will learn about the advanced features supported by Elasticsearch. We will start with the machine learning feature.

In the context of Elasticsearch, machine learning can be thought of as a natural extension of search and analysis. Recall that we looked at Bollinger Bands in Chapter 10, Using Elasticsearch for Exploratory Data...

Machine learning with Elastic Stack

Earlier, one of the main issues related to machine learning while using Elasticsearch is that of solving anomaly detection. It can be traced back to the hot topics discussed in 2014 and earlier (see https://www.businesswire.com/news/home/20140826005072/en/Prelert-Extends-Anomaly-Detection-Elasticsearch and https://speakerdeck.com/elasticsearch/real-time-analytics-and-anomalies-detection-using-elasticsearch-hadoop-and-storm). Basically, anomaly detection is a statistical problem that can be solved in a simple way, by marking the irregularities from the common statistical properties of the input data distribution. However, we can solve the problem with machine learning-based approaches, such as cluster-based anomaly detection and support vector machine-based anomaly detection. The machine learning feature provided by Elastic Stack can involve...

Machine learning using Elasticsearch and scikit-learn

Scikit-learn is a Python machine learning library built on the top of NumPy, SciPy, and Matplotlib. It provides simple tools for data mining and data analysis. According to the description on its website (see https://scikit-learn.org/stable/), we can use it in six major areas:

  • Classification: A supervised learning approach for learning given data and using it to generate a model for a classifier. Then, we use the model to predict new data in order to identify the category with the classifier.
  • Regression: Using a statistical methodology to predict continuous values using a given set of data.
  • Clustering: Grouping data into different categories.
  • Dimensionality reduction: Reducing the dimension of the data.
  • Model selection: Tuning the hyperparameters of the model.
  • Preprocessing: Feature extraction and normalization.

In the last...

Summary

Hooray! We have completed the first part of the advanced feature of this book; that is, machine learning with Elasticsearch. We have introduced the machine learning feature of the Elastic Stack. We created a single-metric job to track the volume field to detect anomalies in the data of the cf_rfem_hist_price index. We have also introduced the Python scikit-learn library and the unsupervised learning algorithm, k-means clustering. The KMean class is provided in the sklearn.cluster package. We have extracted data from the cf_rfem_hist_price index and used three fields, changeOverTime, changePercent, and volume, to construct multidimensional input data, in order for the k-means clustering to find the anomalies. By using the matplotlib.pyplot() function, we have plotted a graph to show the anomalies and the regular data.

In the next chapter, we will provide an overview of...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Elasticsearch 7.0
Published in: Aug 2019Publisher: PacktISBN-13: 9781789957754
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong