Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Elasticsearch 8.x Cookbook - Fifth Edition

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product type Book
Published in May 2022
Publisher Packt
ISBN-13 9781801079815
Pages 750 pages
Edition 5th Edition
Languages
Author (1):
Alberto Paro Alberto Paro
Profile icon Alberto Paro

Table of Contents (20) Chapters

Preface Chapter 1: Getting Started Chapter 2: Managing Mappings Chapter 3: Basic Operations Chapter 4: Exploring Search Capabilities Chapter 5: Text and Numeric Queries Chapter 6: Relationships and Geo Queries Chapter 7: Aggregations Chapter 8: Scripting in Elasticsearch Chapter 9: Managing Clusters Chapter 10: Backups and Restoring Data Chapter 11: User Interfaces Chapter 12: Using the Ingest Module Chapter 13: Java Integration Chapter 14: Scala Integration Chapter 15: Python Integration Chapter 16: Plugin Development Chapter 17: Big Data Integration Chapter 18: X-Pack Other Books You May Enjoy

Chapter 12: Using the Ingest Module

Elasticsearch 8.x introduces a set of powerful functionalities that target the problems that arise during the ingestion of documents via the ingest node.

In Chapter 1, Getting Started, we discussed that the Elasticsearch node can have different roles and the main important ones are master, data, and ingest; the idea of splitting the ingest component from the others is to create a more stable cluster due to problems that can arise when preprocessing documents (mainly due to the custom plugin used in the ingest part, which could require restarting the ingest nodes to be updated).

To create a more stable cluster, the ingest nodes should be isolated by the master nodes (and possibly also from the data ones) in case some problems occur, such as a crash due to plugins such as the attachment plugin and high loads due to complex type manipulation.

The ingestion node can replace a Logstash installation in simple scenarios.

In this chapter, we...

Pipeline definition

The job of ingest nodes is to preprocess documents before sending them to the data nodes. This process is called a pipeline definition and every single step of this pipeline is a processor definition.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. We will use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To define an ingestion pipeline, you need to provide a description and some processors, as follows.

Define a pipeline that adds a field called user with the value john:

{ "description": "Add user john field",
  "processors": [
    { "set":...

Inserting an ingest pipeline

The power of the pipeline definition is the ability for it to be updated and created without a node restart (compared to Logstash). The definition is stored in a cluster state via the put pipeline API.

Now that we've defined a pipeline, we need to provide it to the Elasticsearch cluster.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To store or update an ingestion pipeline in Elasticsearch, we will do the following.

Store the ingest pipeline using a PUT call:

PUT /_ingest/pipeline/add-user-john
{ "description": "Add user john field...

Getting an ingest pipeline

After having stored your pipeline, it is common to retrieve its content, so that you can check its definition. This action can be done via the get pipeline API.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To retrieve an ingestion pipeline in Elasticsearch, we will perform the following steps:

We can retrieve the ingest pipeline using a GET call:

GET /_ingest/pipeline/add-user-john

The result that's returned by Elasticsearch, if everything is okay, should be as follows:

{ "add-user-john" : {
    "description...

Deleting an ingest pipeline

To clean up our Elasticsearch cluster of obsolete or unwanted pipelines, we need to call the delete pipeline API with the ID of the pipeline.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To delete an ingestion pipeline in Elasticsearch, we will perform the following step.

We can delete the ingest pipeline using a DELETE call:

DELETE /_ingest/pipeline/add-user-john

The result that's returned by Elasticsearch, if everything is okay, should be as follows:

{ "acknowledged" : true }

How it works...

The delete pipeline API removes...

Simulating an ingest pipeline

The ingest part of every architecture is very sensitive, so the Elasticsearch team has created the possibility of simulating your pipelines without the need to store them in Elasticsearch.

The simulate pipeline API allows a user to test, improve, and check functionalities of your pipeline without deployment in the Elasticsearch cluster.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To simulate an ingestion pipeline in Elasticsearch, we will perform the following step.

Execute a call for passing both the pipeline and a sample subset of a document that...

Built-in processors

Elasticsearch provides a large set of ingest processors by default. Their number and functionalities can also change from minor versions to extended versions for new scenarios.

In this recipe, we will look at the most commonly used ones.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To use several processors in an ingestion pipeline in Elasticsearch, we will perform the following step.

Execute a simulate pipeline API call using several processors with a sample subset of a document that you can test the pipeline against:

POST /_ingest/pipeline/_simulate
{ "...

The grok processor

Elasticsearch provides a large number of built-in processors that increases with every release. In the preceding examples, we have seen the set and replace ones. In this recipe, we will cover one that's mostly used for log analysis: the grok processor, which is well known to Logstash users.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To test a grok pattern against some log lines, we will perform the following step.

Execute a call by passing both the pipeline with our grok processor and a sample subset of a document to test the pipeline against:

POST /_ingest...

Using the ingest attachment plugin

It's easy to make a cluster non-responsive in Elasticsearch prior to 5.x, by using the attachment mapper. The metadata extraction from a document requires a very high CPU operation and if you are ingesting a lot of documents, your cluster is under-loaded.

To prevent this scenario, Elasticsearch introduced the ingest node. An ingest node can be held under very high pressure without causing problems to the rest of the Elasticsearch cluster.

The attachment processor allows us to use the document extraction capabilities of Tika in an ingest node.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping...

Using the ingest GeoIP processor

Another interesting processor is the GeoIP plugin, which allows us to map an IP address to a geopoint and other location data. It's provided in every Elasticsearch installation by default from version 7.x.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To be able to use the ingest GeoIP processor, perform the following steps:

  1. We can create a pipeline ingest with the attachment processor, using the following command:
    PUT /_ingest/pipeline/geoip
    { "description": "Extract geopoint from an IP",
      "processors"...

Using the enrichment processor

It's quite common to enrich your indexed fields with lookups from other sources. Typical examples are as follows:

  • Resolving the ID of values referenced as external foreign keys in a database
  • Enriching names with the data object to be able to have all the data aggregated in a single document

To be able to solve these use cases, X-Pack provides a special processor called the enrichment processor.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To be able to use the ingest enrich processor, perform the following steps:

  1. We need to...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022 Publisher: Packt ISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}