Reader small image

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781801079815
Edition5th Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 3: Basic Operations

Before we start with indexing and searching in Elasticsearch, we need to cover how to manage indices and perform operations on documents. In this chapter, we'll start by discussing different operations that can be performed on indices, such as create, delete, update, open, and close. These operations are very important because they allow you to define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands.

After that, we'll learn how to manage mappings to complete the discussion we started in the previous chapter and lay down the basis for the next chapter, which is mainly centered on searching.

A large portion of this chapter is dedicated to performing create, read, update, and delete (CRUD) operations on records, which are at the core of storing and managing records in Elasticsearch.

To improve indexing performance, it's also important to understand...

Technical requirements

To follow along and test the commands in this chapter, you must have a working Elasticsearch cluster installed.

For recipes that are marked as (XPACK), the installed Elasticsearch version should have at least the free version of X-Pack installed.

To simplify command management and how they're executed, I suggest installing Kibana.

Creating an index

The first thing you must do before you can start indexing data in Elasticsearch is create an index – the main container of our data.

An index is similar to the concept of a database in SQL; it is a container for types (tables in SQL) and documents (records in SQL).

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP clients, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

How to do it...

The HTTP method for creating an index is PUT; the REST URL contains the index's name:

http://<server>/<index_name>

To create an index, follow these steps:

  1. From the command line, execute a PUT...

Deleting an index

The counterpart of creating an index is deleting one. Deleting an index means deleting its shards, settings, mappings, and data. There are many common scenarios where we need to delete an index, such as the following:

  • Removing the index to clean unwanted or obsolete data (for example, old Logstash indices).
  • Resetting an index for a scratch restart.
  • Deleting an index that has some missing shards, mainly due to some failures, to bring the cluster back to a valid state. (If a node dies and it's storing a single replica shard of an index, this index will be missing a shard, so the cluster state becomes red. In this case, you'll need to bring the cluster back to a green state, but you will lose the data contained in the deleted index.)

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands...

Opening or closing an index

If you want to keep your data but save resources (memory or CPU), a good alternative to deleting indexes is to close them.

Elasticsearch allows you to open and close an index, putting it into online or offline mode, respectively.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, you will need the index we created in the Creating an index recipe.

How to do it...

The HTTP method for opening/closing an index is POST.

The URL format for opening an index is as follows:

http://<server>/<index_name...

Putting a mapping in an index

In the previous chapter, we learned how to build mappings by indexing documents. In this recipe, you will learn how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL-created table.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, you will need the index we created in the Creating an index recipe.

How to do it...

The HTTP method for putting a mapping in an index is PUT (POST also works).

The URL format for putting a mapping in an index...

Getting a mapping

After setting our mappings for processing types, we may need to control or analyze the mapping to prevent issues such as wrong type detection, new fields being created due to a data mismatch, and broken index template configurations.

How we get the mapping for a type helps us understand its structure or evolution due to some merging and implicit type guessing.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP clients, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, you will need the mapping we created in the Putting a mapping in an index recipe.

How to do it...

The...

Reindexing an index

There are a lot of common scenarios that involve changing a mapping. Due to the limitations of Elasticsearch mapping (it's additive), it not possible to delete a defined one, so you often need to reindex index data in a new index with a new mapping. The most common scenarios are as follows:

  • Changing the analyzer of the mapping
  • Adding a new subfield to the mapping, where you need to reprocess all the records to search for the new subfield
  • Removing unused mappings
  • Changing a record structure that requires a new mapping

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character...

Refreshing an index

When you send data to Elasticsearch, the data is not instantly searchable. This only happens after a time interval (generally a second) known as the refresh rate. This delayed approach to data reading/writing allows you to efficiently write large blocks of data by reducing small disk action and increasing the throughput.

Elasticsearch allows the user to control the state of the searcher by forcefully refreshing an index. If it's not forced, the newly indexed document will only be searchable after a fixed time interval (usually, 1 second).

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character...

Flushing an index

For performance reasons, Elasticsearch stores some data in memory and on a transaction log. If we want to free memory, we need to empty the transaction log, and to ensure that our data is safely written on disk, we need to flush an index.

Elasticsearch automatically provides periodic flushing on disk, but forcing flushing can be useful in the following situations:

  • When we need to shut down a node to prevent stale data
  • When we need to have all the data in a safe state (for example, after a big indexing operation so that all the data can be flushed and refreshed)

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides...

Using ForceMerge on an index

The Elasticsearch core is based on Lucene, which stores data in segments on disk. During the life of an index, a lot of segments are created and changed. Since many other NoSQL systems (such as Cassandra, Accumulo, and HBase) prevent segments and part of the data from being rewritten, the records are not deleted in place, but they are put in a tombstone state. This means that the document is marked and deleted in metadata without the data being changed on disk. With the increasing number of segments, the speed of searching is decreased due to the time required to read all of them or skipping the records that aren't live (tombstones). The ForceMerge operation allows us to consolidate the index for quicker searching performance and reducing segments.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands...

Shrinking an index

The latest version of Elasticsearch provides us with a new way to optimize an index. By using the shrink API, it's possible to reduce the number of shards in an index.

This feature targets several common scenarios:

  • A wrong number of shards will be provided during the initial design sizing. Often, sizing the shards without knowing the correct data or text distribution tends to oversize the number of shards.
  • You should reduce the number of shards to reduce memory and resource usage.
  • You should reduce the number of shards to speed up searching.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion...

Checking whether an index exists

A common pitfall error is to query for indices that don't exist. To prevent this issue, Elasticsearch allows you to check for an index's existence.

This check is often used during application startup to create indices that are required for the application to work correctly.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP clients, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the index we created in the Creating an index recipe.

How to do it...

The HTTP method for checking an index's existence is HEAD.

The URL format...

Managing index settings

Index settings are more important because they allow you to control several important Elasticsearch functionalities, such as sharding or replication, caching, term management, routing, and analysis. The goal of this is to understand how to manage index settings.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the index we created in the Creating an index recipe.

How to do it...

To retrieve the settings of your current index, use the following URL format:

http://<server>/<index_name...

Using index aliases

Real-world applications have a lot of indices and queries that span more indices. This scenario requires defining all the indices' names that the queries are based on; aliases allow you to group them under a common name/label.

Some common scenarios for this usage are as follows:

  • Log indices divided by date (that is, logstash-YYYY-MM-DD) for which we want to create an alias for the last week, the last month, today, yesterday, and so on. This pattern is commonly used in log applications such as Logstash (https://www.elastic.co/products/logstash).
  • Collecting a website's content in several indices (New York Times, The Guardian, and so on) for those we want to be referred to by the index alias sites.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP...

Managing dangling indices

In the case of a node failure, if there are not enough replicas, you can lose some shards (and the data within those shards).

Indices with missing shards are marked in red and they are put in read-only mode with issues in case you try to query the data.

In this situation, the only available option is to drop the broken index and recover them from the data or a backup. When the node that failed returns as active in the cluster, there will be some dangling indices (the orphan shards).

The APIs that we will look at in this recipe can be used to manage these indices.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides...

Resolving index names

In the previous recipe, we saw how to use a wildcard to select indices and their aliases.

If you have a large number of indices and aliases, when you try to select them using wildcards, some results won't be provided, so you'll need to understand why. It's also common to need to debug the slowness of a query (due to how much data has been queried) or an error because you are trying to query closed indices.

To help you solve such issues, you can use the resolve index API, which allows you to return all the information about the indices that can be queried.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides...

Rolling over an index

When you're using a system that manages logs, it is very common to use rolling files for your log entries. By doing so, you can have indices that are similar to rolling files.

You can define some conditions that must be checked and leave it to Elasticsearch to roll new indices automatically and refer the use of an alias to a virtual index.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

How to do it…

To enable a rolling index, we need an index with an alias that points to it alone. For example, to set a log rolling index, we would follow these...

Indexing a document

In Elasticsearch, there are two vital operations: index and search.

Indexing means storing one or more documents in an index; this is a similar concept to inserting records in a relational database.

In Lucene, the core engine of Elasticsearch, inserting or updating a document has the same cost: in Lucene and Elasticsearch, to update means to replace.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the index and mapping we created in the Putting a mapping in an index recipe.

How to do it...

Several...

Getting a document

Once you've indexed a document, during your application's life, it will probably need to be retrieved.

The GET REST call allows us to get a document in real time without the need to refresh it.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP clients, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the indexed document from the Indexing a document recipe.

How to do it...

The GET method allows us to return a document, given its index, type, and ID.

The REST API's URL is as follows:

http://<server>/<index_name>/_doc/<...

Deleting a document

Deleting documents in Elasticsearch can be done in two ways: using the DELETE call or the delete_by_query call, which we'll look at in the next chapter.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the indexed document from the Indexing a document recipe.

How to do it...

The REST API URL is the same as it is for GET calls, but the HTTP method is DELETE:

http://<server>/<index_name>/_doc/<id>

To delete a document, follow these steps:

  1. If we consider the order...

Updating a document

Documents stored in Elasticsearch can be updated during their lives. There are two available solutions for performing this operation in Elasticsearch: adding a new document or using the update call.

The update call can work in two ways:

  • By providing a script that uses the update strategy
  • By providing a document that must be merged with the original one

The main advantage of updating versus using an index is the networking reduction and the increased possibility of reducing conflicts due to concurrent changes.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for...

Speeding up atomic operations (bulk operations)

When we are inserting, deleting, or updating a large number of documents, the HTTP overhead is significant. To speed up this process, Elasticsearch allows us to execute the bulk of CRUD calls.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

How to do it...

Since we are changing the state of the data, we must use the POST HTTP method. The REST URL will be as follows:

http://<server>/<index_name/_bulk

To execute a bulk action, we will perform the following steps via curl (because it's very common to prepare your...

Speeding up GET operations (multi-GET)

The standard GET operation is very fast, but if you need to fetch a lot of documents by ID, Elasticsearch provides the _mget operation.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, please use the indexed document we created in the Indexing a document recipe.

How to do it...

The multi-GET REST URLs are as follows:

http://<server</_mget
http://<server>/<index_name>/_mget

To execute a multi-GET action, follow these steps:

  1. First, we must use the POST method with...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022Publisher: PacktISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro