You're reading from Advanced Elasticsearch 7.0

Product typeBook

Published inAug 2019

Reading LevelBeginner

PublisherPackt

ISBN-139781789957754

Edition1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Wai Tak Wong

Preprocessing Documents in Ingest Pipelines

In the previous chapter, we learned all four aggregation families and practiced different types of aggregations with many examples, using investor exchange (IEX) and exchange-traded fund (ETF) historical data. We have completed the study on two key features of Elasticsearch – search and aggregation. In this chapter, we'll switch to the data preparation and enrichment features. You will recall from the Elasticsearch Architecture section of Chapter 1, Overview of Elasticsearch 7, that there are four types of Elasticsearch nodes, and one of them is the ingest node. You can preprocess documents through the predefined pipeline processors before the actual indexing operation starts. All nodes are enabled as ingest by default and you can disable the capability of a node in the configuration file.

In this chapter, we will cover...

Ingest APIs

Basic ingest CRUD APIs allow you to manage the entire life cycle process, from creation, update, retrieval deletion, and execution of the ingest pipeline. A pipeline is formed by a list of supported processors that are executed sequentially. Let's describe each CRUD API as follows:

Create/update the ingest pipeline: To define a pipeline, you need to specify a list of processors (which will be executed in order) and a description to tell us what functions will be performed. The PUT request is used to create the pipeline with an identifier. If the pipeline was created previously, it will be an update request and will overwrite the original contents. Let's take an example of creating a pipeline with the range_ratio identifier by using a script processor. The range ratio is to compute the difference between the high and low price, and then set the ratio between...

Accessing data in pipelines

Fields in the _source field can be accessed directly in a pipeline definition simply by either using the field name or adding the _source prefix to the field name. On the other hand, if you are referring to the value of a field, then you can use the {{field_name}} template snippet to retrieve the value. Let's take an example of using a set processor to add an ingest_timestamp field to the document, which records the timestamp value when the ingest processing occurs. This value is provided by the API, as we mentioned in the result of the _simulate pipeline example. The partially relevant codes are as follows:

"processors": [{
    "set": {
        "field": "ingest_timestamp",
        "value": "{{_ingest.timestamp}}"
    }
}

Dynamic mapping fields and field values are also supported in the same...

Processors

Nearly 30 processors are supported in the ingest pipeline. When the document is indexed, each processor executes as it is declared in the pipeline. A processor is defined with a name and configured with its own parameters. Before we introduce each processor, let's examine some of the common parameters as described in the following table:

Parameter Name	Description
`field`	The name of the field to be accessed in the processor. Most of the processors require this parameter.
`target_field`	The name of the destined field to be accessed in the processor. The default value depends on individual parameters. About half of the processors support this optional parameter.
`ignore_missing`	If the field pointed by the `field` parameter is missing, or the value of the field is a null value in the indexing document, it fails the execution. If this boolean parameter is set...

Conditional execution in pipelines

As we mentioned in the Processors section, the optional if parameter is designed to let users define conditions for executing the pipeline processor. Let's demo a simple case with an example. The rating field of the documents in the index of cf_etf is a single space string when no rating is given for ETF in the original source. We can use the remove processor to remove the rating field in such a condition before the indexing operation, as shown in the following code block:

"pipeline": {
     "description":"remove the rating field if the rating is equal to a single space string",
     "processors":[{
         "remove": {
             "field": "rating",
             "if": "ctx.rating == ' '"
         }
     }]
 }

You will recall the dividend information...

Handling failures in pipelines

As discussed in the Ingest APIs section of this chapter, a pipeline is formed by a list of supported processors that are executed sequentially. If an exception occurs, the whole process is halted. Let's show an exception with an example. The processor in the pipeline is to remove the rating field from the indexing document. However, the rating field is optional and it may not be present. When an error occurs, you can check out the root clause in the error field. When the field rating is missing in the remove processor, it shows you that the reason is field [rating] not present as part of path [rating]:

If the error can be ignored, you can set the optional ingore_failure parameter to true to silently ignore the failure and continue the execution of the next processor. Another choice is to use the on_failure parameter to catch the exception...

Summary

Time flies so fast! We are in the middle of this book. In this chapter, we have performed the Ingest APIs and practiced most of the pipeline processors. We have also learned about data access to documents that pass through the pipeline processor. Finally, we have discussed how to handle exceptions when errors occur during pipeline processing.

In the next chapter, we will discuss how to use aggregation frameworks for exploratory data analysis. We'll give you a few examples, such as collecting metrics and log data generated by the system for operational data analytics, ingesting financial investment fund data before performing analytic operations, and performing simple sentiment analysis using Elasticsearch.

The rest of the chapter is locked

You have been reading a chapter from

Advanced Elasticsearch 7.0

Published in: Aug 2019Publisher: PacktISBN-13: 9781789957754

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong

Other recommended products

Related to this chapter

Elasticsearch 7 Quick Start Guide

Elasticsearch is one of the most popular tools for distributed search. This book will help you in understanding all about the new features of Elasticsearch 7, and how to use them efficiently for searching, aggregating and indexing data with speed and accuracy.

BookOct 2019186 pages

Learning Elasticsearch

Elasticsearch is a Lucene-based search and analytics engine for distributed search and analytics. This book will be your hands-on guide as you explore and put to use the features of Elasticsearch 5.x.

BookJun 2017404 pages

Mastering Elasticsearch 5.x

This book will help you leverage Elasticsearch, guiding you through everything from writing and creating customized plugins to extend Elasticsearch to tackling challenges while handling relational data in Elasticsearch. You’ll learn with the help of practical examples in a step-by-step way.

BookFeb 2017428 pages

Elasticsearch 7.0 Cookbook

This book is your one-stop guide to master Elasticsearch. It provides numerous problem-solution based recipes through which you can implement Elasticsearch in your enterprise applications in a very simple, hassle-free way.

BookApr 2019724 pages

Elasticsearch 5.x Cookbook

BookFeb 2017696 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Learning Elastic Stack 7.0

This book teaches you about every component of the Elastic Stack - including Elasticsearch, Kibana, Logstash, and X-pack - with new and the updated features that are released with the 7.0 version. With the help of this book, you will be able to develop enterprise-grade distributed search and analytics applications for your data without any hassle.

BookMay 2019474 pages

Learning Elastic Stack 6.0

This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications. It provide in-depth coverage of the different components of the Elastic Stack, and how to use them all together.

BookDec 2017434 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Elastic Stack

BookFeb 2017526 pages

Machine Learning with the Elastic Stack

Machine Learning with the Elastic Stack, Second Edition, provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for supervised learning and unsupervised learning that helps make machine learning truly operational for you.

BookMay 2021450 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages