You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781801079815

Edition5th Edition

Tools

Elasticsearch Elasticsearch

Concepts

Enterprise Search

Author (1)

Alberto Paro

Chapter 3: Basic Operations

Before we start with indexing and searching in Elasticsearch, we need to cover how to manage indices and perform operations on documents. In this chapter, we'll start by discussing different operations that can be performed on indices, such as create, delete, update, open, and close. These operations are very important because they allow you to define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands.

After that, we'll learn how to manage mappings to complete the discussion we started in the previous chapter and lay down the basis for the next chapter, which is mainly centered on searching.

A large portion of this chapter is dedicated to performing create, read, update, and delete (CRUD) operations on records, which are at the core of storing and managing records in Elasticsearch.

To improve indexing performance, it's also important to understand...

Technical requirements

To follow along and test the commands in this chapter, you must have a working Elasticsearch cluster installed.

For recipes that are marked as (XPACK), the installed Elasticsearch version should have at least the free version of X-Pack installed.

To simplify command management and how they're executed, I suggest installing Kibana.

Creating an index

The first thing you must do before you can start indexing data in Elasticsearch is create an index – the main container of our data.

An index is similar to the concept of a database in SQL; it is a container for types (tables in SQL) and documents (records in SQL).

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP clients, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

How to do it...

The HTTP method for creating an index is PUT; the REST URL contains the index's name:

http://<server>/<index_name>

To create an index, follow these steps:

From the command line, execute a PUT...

Deleting an index

The counterpart of creating an index is deleting one. Deleting an index means deleting its shards, settings, mappings, and data. There are many common scenarios where we need to delete an index, such as the following:

Removing the index to clean unwanted or obsolete data (for example, old Logstash indices).
Resetting an index for a scratch restart.
Deleting an index that has some missing shards, mainly due to some failures, to bring the cluster back to a valid state. (If a node dies and it's storing a single replica shard of an index, this index will be missing a shard, so the cluster state becomes red. In this case, you'll need to bring the cluster back to a green state, but you will lose the data contained in the deleted index.)

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands...

Opening or closing an index

If you want to keep your data but save resources (memory or CPU), a good alternative to deleting indexes is to close them.

Elasticsearch allows you to open and close an index, putting it into online or offline mode, respectively.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, you will need the index we created in the Creating an index recipe.

How to do it...

The HTTP method for opening/closing an index is POST.

The URL format for opening an index is as follows:

http://<server>/<index_name...

Putting a mapping in an index

In the previous chapter, we learned how to build mappings by indexing documents. In this recipe, you will learn how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL-created table.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, you will need the index we created in the Creating an index recipe.

How to do it...

The HTTP method for putting a mapping in an index is PUT (POST also works).

The URL format for putting a mapping in an index...

Getting a mapping

After setting our mappings for processing types, we may need to control or analyze the mapping to prevent issues such as wrong type detection, new fields being created due to a data mismatch, and broken index template configurations.

How we get the mapping for a type helps us understand its structure or evolution due to some merging and implicit type guessing.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, you will need the mapping we created in the Putting a mapping in an index recipe.

How to do it...

The...

Reindexing an index

There are a lot of common scenarios that involve changing a mapping. Due to the limitations of Elasticsearch mapping (it's additive), it not possible to delete a defined one, so you often need to reindex index data in a new index with a new mapping. The most common scenarios are as follows:

Changing the analyzer of the mapping
Adding a new subfield to the mapping, where you need to reprocess all the records to search for the new subfield
Removing unused mappings
Changing a record structure that requires a new mapping

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Refreshing an index

When you send data to Elasticsearch, the data is not instantly searchable. This only happens after a time interval (generally a second) known as the refresh rate. This delayed approach to data reading/writing allows you to efficiently write large blocks of data by reducing small disk action and increasing the throughput.

Elasticsearch allows the user to control the state of the searcher by forcefully refreshing an index. If it's not forced, the newly indexed document will only be searchable after a fixed time interval (usually, 1 second).

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Flushing an index

For performance reasons, Elasticsearch stores some data in memory and on a transaction log. If we want to free memory, we need to empty the transaction log, and to ensure that our data is safely written on disk, we need to flush an index.

Elasticsearch automatically provides periodic flushing on disk, but forcing flushing can be useful in the following situations:

When we need to shut down a node to prevent stale data
When we need to have all the data in a safe state (for example, after a big indexing operation so that all the data can be flushed and refreshed)

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Using ForceMerge on an index

The Elasticsearch core is based on Lucene, which stores data in segments on disk. During the life of an index, a lot of segments are created and changed. Since many other NoSQL systems (such as Cassandra, Accumulo, and HBase) prevent segments and part of the data from being rewritten, the records are not deleted in place, but they are put in a tombstone state. This means that the document is marked and deleted in metadata without the data being changed on disk. With the increasing number of segments, the speed of searching is decreased due to the time required to read all of them or skipping the records that aren't live (tombstones). The ForceMerge operation allows us to consolidate the index for quicker searching performance and reducing segments.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands...

Shrinking an index

The latest version of Elasticsearch provides us with a new way to optimize an index. By using the shrink API, it's possible to reduce the number of shards in an index.

This feature targets several common scenarios:

A wrong number of shards will be provided during the initial design sizing. Often, sizing the shards without knowing the correct data or text distribution tends to oversize the number of shards.
You should reduce the number of shards to reduce memory and resource usage.
You should reduce the number of shards to speed up searching.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Checking whether an index exists

A common pitfall error is to query for indices that don't exist. To prevent this issue, Elasticsearch allows you to check for an index's existence.

This check is often used during application startup to create indices that are required for the application to work correctly.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the index we created in the Creating an index recipe.

How to do it...

The HTTP method for checking an index's existence is HEAD.

The URL format...

Managing index settings

Index settings are more important because they allow you to control several important Elasticsearch functionalities, such as sharding or replication, caching, term management, routing, and analysis. The goal of this is to understand how to manage index settings.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the index we created in the Creating an index recipe.

How to do it...

To retrieve the settings of your current index, use the following URL format:

http://<server>/<index_name...

Using index aliases

Real-world applications have a lot of indices and queries that span more indices. This scenario requires defining all the indices' names that the queries are based on; aliases allow you to group them under a common name/label.

Some common scenarios for this usage are as follows:

Log indices divided by date (that is, logstash-YYYY-MM-DD) for which we want to create an alias for the last week, the last month, today, yesterday, and so on. This pattern is commonly used in log applications such as Logstash (https://www.elastic.co/products/logstash).
Collecting a website's content in several indices (New York Times, The Guardian, and so on) for those we want to be referred to by the index alias sites.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP...

Managing dangling indices

In the case of a node failure, if there are not enough replicas, you can lose some shards (and the data within those shards).

Indices with missing shards are marked in red and they are put in read-only mode with issues in case you try to query the data.

In this situation, the only available option is to drop the broken index and recover them from the data or a backup. When the node that failed returns as active in the cluster, there will be some dangling indices (the orphan shards).

The APIs that we will look at in this recipe can be used to manage these indices.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Resolving index names

In the previous recipe, we saw how to use a wildcard to select indices and their aliases.

If you have a large number of indices and aliases, when you try to select them using wildcards, some results won't be provided, so you'll need to understand why. It's also common to need to debug the slowness of a query (due to how much data has been queried) or an error because you are trying to query closed indices.

To help you solve such issues, you can use the resolve index API, which allows you to return all the information about the indices that can be queried.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Rolling over an index

When you're using a system that manages logs, it is very common to use rolling files for your log entries. By doing so, you can have indices that are similar to rolling files.

You can define some conditions that must be checked and leave it to Elasticsearch to roll new indices automatically and refer the use of an alias to a virtual index.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

How to do it…

To enable a rolling index, we need an index with an alias that points to it alone. For example, to set a log rolling index, we would follow these...

Indexing a document

In Elasticsearch, there are two vital operations: index and search.

Indexing means storing one or more documents in an index; this is a similar concept to inserting records in a relational database.

In Lucene, the core engine of Elasticsearch, inserting or updating a document has the same cost: in Lucene and Elasticsearch, to update means to replace.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the index and mapping we created in the Putting a mapping in an index recipe.

How to do it...

Several...

Getting a document

Once you've indexed a document, during your application's life, it will probably need to be retrieved.

The GET REST call allows us to get a document in real time without the need to refresh it.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the indexed document from the Indexing a document recipe.

How to do it...

The GET method allows us to return a document, given its index, type, and ID.

The REST API's URL is as follows:

http://<server>/<index_name>/_doc/<...

Deleting a document

Deleting documents in Elasticsearch can be done in two ways: using the DELETE call or the delete_by_query call, which we'll look at in the next chapter.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the indexed document from the Indexing a document recipe.

How to do it...

The REST API URL is the same as it is for GET calls, but the HTTP method is DELETE:

http://<server>/<index_name>/_doc/<id>

To delete a document, follow these steps:

If we consider the order...

Updating a document

Documents stored in Elasticsearch can be updated during their lives. There are two available solutions for performing this operation in Elasticsearch: adding a new document or using the update call.

The update call can work in two ways:

By providing a script that uses the update strategy
By providing a document that must be merged with the original one

The main advantage of updating versus using an index is the networking reduction and the increased possibility of reducing conflicts due to concurrent changes.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

Speeding up atomic operations (bulk operations)

When we are inserting, deleting, or updating a large number of documents, the HTTP overhead is significant. To speed up this process, Elasticsearch allows us to execute the bulk of CRUD calls.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

How to do it...

Since we are changing the state of the data, we must use the POST HTTP method. The REST URL will be as follows:

http://<server>/<index_name/_bulk

To execute a bulk action, we will perform the following steps via curl (because it's very common to prepare your...

Speeding up GET operations (multi-GET)

The standard GET operation is very fast, but if you need to fetch a lot of documents by ID, Elasticsearch provides the _mget operation.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the following commands correctly, please use the indexed document we created in the Indexing a document recipe.

How to do it...

The multi-GET REST URLs are as follows:

http://<server</_mget
http://<server>/<index_name>/_mget

To execute a multi-GET action, follow these steps:

First, we must use the POST method with...

The rest of the chapter is locked

You have been reading a chapter from

Elasticsearch 8.x Cookbook - Fifth Edition

Published in: May 2022Publisher: PacktISBN-13: 9781801079815

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages