Basic Operations

Before we start with indexing and searching in Elasticsearch, we need to cover how to manage indices and perform operations on documents. In this chapter, we'll start by discussing different operations on indices, such as create, delete, update, open, and close. These operations are very important because they allow you to define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands.

After the indices management part, we'll learn how to manage mappings to complete the discussion we started in the previous chapter and to lay down the basis for the next chapter, which is mainly centered on searching.

A large portion of this chapter is dedicated to create-read-update-delete (CRUD) operations on records that are at the core of record storing and management in...

Creating an index

The first operation to undertake before starting indexing data in Elasticsearch is to create an index—the main container of our data.

An index is similar to the concept of a database in SQL; it is a container for types (tables in SQL) and documents (records in SQL).

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), and others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

...

Deleting an index

The counterpart of creating an index is deleting one. Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index, such as the following:

Removing the index to clean unwanted or obsolete data (for example, old Logstash indices).
Resetting an index for a scratch restart.
Deleting an index that has some missing shards, mainly due to some failures, to bring the cluster back in a valid state. (If a node dies and it's storing a single replica shard of an index, this index will be missing a shard, and so the cluster state becomes red. In this case, you'll bring back the cluster to a green status, but you will lose the data contained in the deleted index.)

Getting ready

...

Opening or closing an index

If you want to keep your data but save resources (memory or CPU), a good alternative to deleting indexes is to close them.

Elasticsearch allows you to open and close an index, putting it into online or offline mode.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly...

Putting a mapping in an index

In the previous chapter, we saw how to build mappings by indexing documents. This recipe shows how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL-created table.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands&...

Getting a mapping

After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand the structure or its evolution due to some merge and implicit type guessing.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the...

Reindexing an index

There are a lot of common scenarios that involve changing your mapping. Due to the limitations of Elasticsearch mapping, it not possible to delete a defined one, so you often need to reindex index data. The most common scenarios are as follows:

Changing an analyzer for a mapping
Adding a new subfield to a mapping, whereupon you need to reprocess all the records to search for the new subfield
Removing an unused mapping
Changing a record structure that requires a new mapping

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used...

Refreshing an index

Elasticsearch allows the user to control the state of the searcher using a forced refresh on an index. If not forced, the newly indexed document will only be searchable after a fixed time interval (usually 1 second).

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly...

Flushing an index

For performance reasons, Elasticsearch stores some data in memory and on a transaction log. If we want to free memory, we need to empty the transaction log, and to be sure that our data is safely written on disk, we need to flush an index.

Elasticsearch automatically provides periodic flushing on disk, but forcing flushing can be useful, for example:

When we need to shut down a node to prevent stale data
To have all the data in a safe state (for example, after a big indexing operation to have all the data flushed and refreshed)

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting...

ForceMerge an index

The Elasticsearch core is based on Lucene, which stores the data in segments on disk. During the life of an index, a lot of segments are created and changed. With the increase of segment numbers, the speed of searching is decreased due to the time required to read all of them. The ForceMerge operation allows us to consolidate the index for faster searching performance and reducing segments.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others...

Shrinking an index

The latest version of Elasticsearch provides a new way to optimize the index. Using the shrink API, it's possible to reduce the number of shards of an index.

This feature targets several common scenarios:

There will be the wrong number of shards during the initial design sizing. Often, sizing the shards without knowing the correct data or text distribution tends to oversize the number of shards.
Reducing the number of shards to reduce memory and resource usage.
Reducing the number of shards to speed up searching.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started...

Checking if an index exists

A common pitfall error is to query for indices that don't exist. To prevent this issue, Elasticsearch gives the user the ability to check for an index's existence.

This check is often used during an application startup to create indices that are required for correct working.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping...

Managing index settings

Index settings are more important because they allow you to control several important Elasticsearch functionalities, such as sharding or replication, caching, term management, routing, and analysis.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, use the index...

Using index aliases

Real-world applications have a lot of indices and queries that span more indices. This scenario requires defining all the indices' names on which queries are based; aliases allow grouping of them under a common name.

Some common scenarios for this usage are as follows:

Log indices divided by date (that is, log_YYMMDD) for which we want to create an alias for the last week, the last month, today, yesterday, and so on. This pattern is commonly used in log applications such as Logstash (https://www.elastic.co/products/logstash).
Collecting website contents in several indices (New York Times, The Guardian, ...) for those we want to be referred to by the index alias sites.

Getting ready

You need an up-and-running...

Rolling over an index

When using a system that manages logs, it is very common to use rolling files for your log entries. By using this idea, we can have indices that are similar to rolling files.

We can define some conditions to be checked and leave it to Elasticsearch to roll new indices automatically and refer the use of an alias just to a virtual index.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides...

Indexing a document

In Elasticsearch, there are two vital operations: index and search.

Indexing means storing one or more documents in an index: a similar concept to inserting records in a relational database.

In Lucene, the core engine of Elasticsearch, inserting or updating a document has the same cost: in Lucene and Elasticsearch, to update means to replace.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides...

Getting a document

After having indexed a document, during your application's life, it will probably need to be retrieved.

The GET REST call allows us to get a document in real time without the need for a refresh.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, use the indexed document...

Deleting a document

Deleting documents in Elasticsearch is possible in two ways: using the DELETE call or the delete_by_query call, which we'll look at in the next chapter.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, use the indexed document in the Indexing a document recipe.

...

Updating a document

Documents stored in Elasticsearch can be updated during their lives. There are two available solutions for performing this operation in Elasticsearch: adding a new document, or using the update call.

The update call can work in two ways:

By providing a script that uses the update strategy
By providing a document that must be merged with the original one

The main advantage of an update versus an index is the networking reduction.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman...

Speeding up atomic operations (bulk operations)

When we are inserting, deleting, or updating a large number of documents, the HTTP overhead is significant. To speed up this process, Elasticsearch allows the execution of the bulk of CRUD calls.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

...

Speeding up GET operations (multi GET)

The standard GET operation is very fast, but if you need to fetch a lot of documents by ID, Elasticsearch provides the multi GET operation.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute these commands, HTTP clients can be used, such as curl (https://curl.haxx.se/), postman (https://www.getpostman.com/), or others similar. I suggest using the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To execute the following commands correctly, use the indexed document we created in the...