Reader small image

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781801079815
Edition5th Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 8: Scripting in Elasticsearch

Elasticsearch has a powerful way of extending its capabilities by using custom scripts, which can be written in several programming languages. The most common ones are Painless, Express, and Mustache. In this chapter, we will explore how it’s possible to create custom scoring algorithms, specially processed return fields, custom sorting, complex update operations on records, and ingest processors. The scripting concept of Elasticsearch is an advanced stored-procedure system in the NoSQL world; due to this, every advanced user of Elasticsearch should learn how to master it.

Elasticsearch natively provides scripting in Java (that is, Java code compiled in JAR files), Painless, Express, and Mustache; however, a lot of other interesting languages are also available as plugins, such as Kotlin and Velocity. In older Elasticsearch releases, prior to version 5.0, the official scripting language was Groovy. But, for better sandboxing and performance...

Painless scripting

Painless is a simple, secure scripting language that is available in Elasticsearch by default. It was designed by the Elasticsearch team to be used specifically with Elasticsearch and can safely be used with inline and stored scripting. Its syntax is similar to Groovy, from which it was originally born.

In this recipe, we will see how to create a custom score function in Painless.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

In order to execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index that is populated with the ch07/populate_aggregation.txt commands...

Installing additional scripting languages

Elasticsearch provides native scripting (that is, Java code compiled in JAR files) and Painless, but a lot of other interesting languages are also available, such as Kotlin.

Note

At the time of writing this book, there are no available language plugins as part of Elasticsearch’s official ones. Usually, plugin authors will take a week or up to a month to update their plugins to the new version after a major release. This section will be a reference for this use case based on Elasticsearch 7.x. As previously stated, the official language is now Painless, and this is provided by default in Elasticsearch for better sandboxing and performance.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

In order to install the Kotlin language support for Elasticsearch, we...

Managing scripts

Depending on your scripting usage, there are several ways of customizing Elasticsearch in order to use your script extensions.

In this recipe, we will demonstrate how you can manage scripts by storing them in Elasticsearch or providing them inline in API calls.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

In order to execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index populated with the ch08/populate_aggregation.txt commands – these are available in the online code.

In order to be able to use regular expressions in Painless scripting, you will...

Sorting data using scripts

Elasticsearch provides scripting support for sorting functionality. In real-world applications, there is often a need to modify the default sorting using an algorithm that is dependent on the context and some external variables. Some common scenarios are as follows:

  • Sorting places near a point
  • Sorting by most read articles
  • Sorting items by custom user logic
  • Sorting items by revenue

Because the computing of scores on a large dataset is very CPU-intensive, if you use scripting, then it’s better to execute it on a small dataset using standard score queries for detecting the top documents, and then execute a rescoring on the top subset.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se...

Computing return fields with scripting

Elasticsearch allows us to define custom complex expressions that can be used to return a newly calculated field value.

The most common scenarios for these use cases are as follows:

  • Merge field values (that is, first name + last name)
  • Compute values (that is, total=quantity*price)
  • Apply transformations (that is, convert dollars to euros, string manipulation)

These special fields are called script_fields, and they can be expressed with a script in every available Elasticsearch scripting language.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console, as it provides code completion and better character escaping for...

Filtering a search using scripting

In Chapter 4, Exploring Search Capabilities, we explored many filters. Elasticsearch scripting allows the extension of a traditional filter by using custom scripts.

Using scripting to create a custom filter is a convenient way to write scripting rules that are not provided by Lucene or Elasticsearch, and to implement business logic that is not available in a DSL query.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index that is populated with the ch07/populate_aggregation.txt commands &...

Using scripting in aggregations

Scripting can be used in aggregations for extending its analytics capabilities to manipulate and transform the values used in metric aggregations or to define new rules to create buckets.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1,Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index that is populated with the ch07/populate_aggregation.txt commands – these are available in the online code.

In order to be able to use regular expressions in Painless scripting, you will need to activate them in elasticsearch.yml by adding script.painless...

Updating a document using scripts

Elasticsearch allows you to update a document in place. Updating a document using scripting reduces network traffic (otherwise, you need to fetch the document, change the field or fields, and then send them back) and improves performance when you need to process a large number of documents.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use Kibana Console, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need an index that is populated with the ch07/populate_aggregation.txt commands – these are available in the online code.

In order to be able to use regular...

Reindexing with a script

Reindexing is a functionality for automatically copying your data into a new index. This action is often done to cover different scenarios, as follows:

  • Reindexing after a mapping change
  • Removing a field from an index
  • Adding new fields based on a function

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute curl using the command line, you will need to install curl for your operating system.

In order to correctly execute the following commands, you will need an index that is populated with the ch07/populate_aggregation.txt script (available in the online code), and the JavaScript or Python language scripting plugins installed.

How to do it...

For reindexing with a script, we will perform the following steps:

  1. Create the destination index, as this is not created...

Scripting in ingest processors

In Chapter 12, Using the Ingest Module, we will see several types of ingest processors.

Ingest processors are the building blocks for an ingestion pipeline; they describe an action that can be executed on a document to modify it.

Scripting is the main functionality used in processors to provide the core functionalities for completeness. Their scripting functionalities are discussed in this chapter.

Getting ready

You will need an up-and-running Elasticsearch installation, similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

We will simulate a pipeline with a set and a script processor. To modify our documents before ingesting them, we will perform the following steps:

  1. Execute a pipeline simulation API call with the two processor steps and two documents as a sample:
    POST /_ingest/pipeline/_simulate
    { “pipeline”: {
       ...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022Publisher: PacktISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro