You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Product type Book

Published in May 2022

Publisher Packt

ISBN-13 9781801079815

Pages 750 pages

Edition 5th Edition

Languages

Concepts

Enterprise Search

Author (1):

Alberto Paro

Table of Contents (20) Chapters

Preface

Chapter 1: Getting Started

Chapter 2: Managing Mappings

Chapter 3: Basic Operations

Chapter 4: Exploring Search Capabilities

Chapter 5: Text and Numeric Queries

Chapter 6: Relationships and Geo Queries

Chapter 7: Aggregations

Chapter 8: Scripting in Elasticsearch

Chapter 9: Managing Clusters

Chapter 10: Backups and Restoring Data

Chapter 11: User Interfaces

Chapter 12: Using the Ingest Module

Chapter 13: Java Integration

Chapter 14: Scala Integration

Chapter 15: Python Integration

Chapter 16: Plugin Development

Chapter 17: Big Data Integration

Chapter 18: X-Pack

Other Books You May Enjoy

Chapter 12: Using the Ingest Module

Elasticsearch 8.x introduces a set of powerful functionalities that target the problems that arise during the ingestion of documents via the ingest node.

In Chapter 1, Getting Started, we discussed that the Elasticsearch node can have different roles and the main important ones are master, data, and ingest; the idea of splitting the ingest component from the others is to create a more stable cluster due to problems that can arise when preprocessing documents (mainly due to the custom plugin used in the ingest part, which could require restarting the ingest nodes to be updated).

To create a more stable cluster, the ingest nodes should be isolated by the master nodes (and possibly also from the data ones) in case some problems occur, such as a crash due to plugins such as the attachment plugin and high loads due to complex type manipulation.

The ingestion node can replace a Logstash installation in simple scenarios.

In this chapter, we...

Pipeline definition

The job of ingest nodes is to preprocess documents before sending them to the data nodes. This process is called a pipeline definition and every single step of this pipeline is a processor definition.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. We will use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To define an ingestion pipeline, you need to provide a description and some processors, as follows.

Define a pipeline that adds a field called user with the value john:

{ "description": "Add user john field",
  "processors": [
    { "set":...

Inserting an ingest pipeline

The power of the pipeline definition is the ability for it to be updated and created without a node restart (compared to Logstash). The definition is stored in a cluster state via the put pipeline API.

Now that we've defined a pipeline, we need to provide it to the Elasticsearch cluster.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To store or update an ingestion pipeline in Elasticsearch, we will do the following.

Store the ingest pipeline using a PUT call:

PUT /_ingest/pipeline/add-user-john
{ "description": "Add user john field...

Getting an ingest pipeline

After having stored your pipeline, it is common to retrieve its content, so that you can check its definition. This action can be done via the get pipeline API.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To retrieve an ingestion pipeline in Elasticsearch, we will perform the following steps:

We can retrieve the ingest pipeline using a GET call:

GET /_ingest/pipeline/add-user-john

The result that's returned by Elasticsearch, if everything is okay, should be as follows:

{ "add-user-john" : {
    "description...

Deleting an ingest pipeline

To clean up our Elasticsearch cluster of obsolete or unwanted pipelines, we need to call the delete pipeline API with the ID of the pipeline.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To delete an ingestion pipeline in Elasticsearch, we will perform the following step.

We can delete the ingest pipeline using a DELETE call:

DELETE /_ingest/pipeline/add-user-john

The result that's returned by Elasticsearch, if everything is okay, should be as follows:

{ "acknowledged" : true }

How it works...

The delete pipeline API removes...

Simulating an ingest pipeline

The ingest part of every architecture is very sensitive, so the Elasticsearch team has created the possibility of simulating your pipelines without the need to store them in Elasticsearch.

The simulate pipeline API allows a user to test, improve, and check functionalities of your pipeline without deployment in the Elasticsearch cluster.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping for Elasticsearch.

How to do it...

To simulate an ingestion pipeline in Elasticsearch, we will perform the following step.

Execute a call for passing both the pipeline and a sample subset of a document that...

Built-in processors

Elasticsearch provides a large set of ingest processors by default. Their number and functionalities can also change from minor versions to extended versions for new scenarios.

In this recipe, we will look at the most commonly used ones.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To use several processors in an ingestion pipeline in Elasticsearch, we will perform the following step.

Execute a simulate pipeline API call using several processors with a sample subset of a document that you can test the pipeline against:

POST /_ingest/pipeline/_simulate
{ "...

The grok processor

Elasticsearch provides a large number of built-in processors that increases with every release. In the preceding examples, we have seen the set and replace ones. In this recipe, we will cover one that's mostly used for log analysis: the grok processor, which is well known to Logstash users.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To test a grok pattern against some log lines, we will perform the following step.

Execute a call by passing both the pipeline with our grok processor and a sample subset of a document to test the pipeline against:

POST /_ingest...

Using the ingest attachment plugin

It's easy to make a cluster non-responsive in Elasticsearch prior to 5.x, by using the attachment mapper. The metadata extraction from a document requires a very high CPU operation and if you are ingesting a lot of documents, your cluster is under-loaded.

To prevent this scenario, Elasticsearch introduced the ingest node. An ingest node can be held under very high pressure without causing problems to the rest of the Elasticsearch cluster.

The attachment processor allows us to use the document extraction capabilities of Tika in an ingest node.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

Using the ingest GeoIP processor

Another interesting processor is the GeoIP plugin, which allows us to map an IP address to a geopoint and other location data. It's provided in every Elasticsearch installation by default from version 7.x.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To be able to use the ingest GeoIP processor, perform the following steps:

We can create a pipeline ingest with the attachment processor, using the following command:

PUT /_ingest/pipeline/geoip
{ "description": "Extract geopoint from an IP",
  "processors"...

Using the enrichment processor

It's quite common to enrich your indexed fields with lookups from other sources. Typical examples are as follows:

Resolving the ID of values referenced as external foreign keys in a database
Enriching names with the data object to be able to have all the data aggregated in a single document

To be able to solve these use cases, X-Pack provides a special processor called the enrichment processor.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

How to do it...

To be able to use the ingest enrich processor, perform the following steps:

We need to...

The rest of the chapter is locked

You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Table of Contents (20) Chapters

Chapter 12: Using the Ingest Module

Pipeline definition

Getting ready

How to do it...

Inserting an ingest pipeline

Getting ready

How to do it...

Getting an ingest pipeline

Getting ready

How to do it...

Deleting an ingest pipeline

Getting ready

How to do it...

How it works...

Simulating an ingest pipeline

Getting ready

How to do it...

Built-in processors

Getting ready

How to do it...

The grok processor

Getting ready

How to do it...

Using the ingest attachment plugin

Getting ready

Using the ingest GeoIP processor

Getting ready

How to do it...

Using the enrichment processor

Getting ready

How to do it...

Authors (1)

Personalised recommendations for you

You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Table of Contents (20) Chapters

Chapter 12: Using the Ingest Module

Pipeline definition

Getting ready

How to do it...

Inserting an ingest pipeline

Getting ready

How to do it...

Getting an ingest pipeline

Getting ready

How to do it...

Deleting an ingest pipeline

Getting ready

How to do it...

How it works...

Simulating an ingest pipeline

Getting ready

How to do it...

Built-in processors

Getting ready

How to do it...

The grok processor

Getting ready

How to do it...

Using the ingest attachment plugin

Getting ready

Using the ingest GeoIP processor

Getting ready

How to do it...

Using the enrichment processor

Getting ready

How to do it...

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you