Packt+ | Advance your knowledge in tech

You're reading from ElasticSearch Cookbook

Product typeBook

Published inDec 2013

Reading LevelBeginner

PublisherPackt

ISBN-139781782166627

Edition1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Alberto Paro

Chapter 4. Standard Operations

In this chapter, we will cover the following topics:

Creating an index
Deleting an index
Opening/closing an index
Putting a mapping in an index
Getting a mapping
Deleting a mapping
Refreshing an index
Flushing an index
Optimizing an index
Checking if an index or type exists
Managing index settings
Using index aliases
Indexing a document
Getting a document
Deleting a document
Updating a document
Speeding up atomic operations (bulk)
Speeding up GET

Introduction

This chapter covers how to manage indices and operations on documents. We'll start by discussing different operations on indices, such as create, delete, update, open and close.

After indices, we'll see how to manage mappings to complete the discussion on them started in the previous chapter and to create a base for the next chapter mainly centered on search.

In this chapter, a lot of space is given to CRUD (Create-Read-Update-Delete) operations on records. For improved performance in indexing, it's also important to understand bulk operations and avoid their common pitfalls.

This chapter doesn't cover operations involving queries which are the main topics of the next chapter.

Also cluster operations will be discussed in the Monitor chapter because they are mainly related to control and monitor the cluster.

Creating an index

The first operation to do before starting indexing data in ElasticSearch is to create an index: the main container of our data.

An index is similar to the concept of a database in SQL.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

The HTTP method to create an index is PUT (but POST also works), the REST URL is the index name, which is written as follows:

http://<server>/<index_name>

For creating an index, we need to perform the following steps:

From command line, we can execute a PUT call as follows:

curl -XPUT http://127.0.0.1:9200/myindex -d '{
    "settings" : {
        "index" : {
            "number_of_shards" : 2,
            "number_of_replicas" : 1
        }
    }
}'

The result returned by ElasticSearch, if everything is all right, should be as follows:
```
{"ok":true,"acknowledged":true}
```

If the index already exists, a 400 error is returned:

{"error":"IndexAlreadyExistsException[[myindex] Already exists]","status":400}

How it works...

There...

Deleting an index

The counterpart of creating an index is deleting one.

Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index. Some of them are as follows:

Removing it because the data that it contains is not needed anymore
Reset an index for a scratch restart
Delete an index that has some missing shard due to some failure to bring back the cluster in a valid state

Getting ready

You need a working ElasticSearch cluster and the existing index created in the previous recipe.

How to do it...

The HTTP method used to delete an index is DELETE.

The URL contains only the index name, which is as follows:

http://<server>/<index_name>

For deleting an index, we need to perform the following steps:

From command line, we can execute a DELETE call as follows:
```
curl -XDELETE http://127.0.0.1:9200/myindex
```
The result returned by ElasticSearch, if everything is all right, should be as follows:
```
{"ok":true,"acknowledged":true}
```
If the...

Opening/closing an index

If you want to keep your data but save resources (memory/CPU), a good alternative to deleting an index is to close them.

ElasticSearch allows to open/close an index for putting it in the online/offline mode.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

For opening/closing an index, we need to perform the following steps:

From command line, we can execute a POST call to close an index as follows:
```
curl -XPOST http://127.0.0.1:9200/myindex/_close
```
If the call is successfully made, the result returned by ElasticSearch should be as follows:
```
{"ok":true,"acknowledged":true}
```
To open an index from command line use the following command:
```
curl -XPOST http://127.0.0.1:9200/myindex/_open
```
If the call is successfully made, the result returned by ElasticSearch should be as follows:
```
{"ok":true,"acknowledged":true}
```

How it works...

When an index is closed, there is no overhead on the cluster (except for metadata state...

Putting a mapping in an index

In the previous chapter, we saw how to build a mapping for our data. This recipe shows how to put a type in an index. This kind of operation can be considered as the ElasticSearch version of an SQL create table command.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method to put a mapping is PUT (POST also works).

The URL format for putting a mapping is as follows:

http://<server>/<index_name>/<type_name>/_mapping

For putting a mapping in an index, we need to perform the following steps:

If we consider the type order of the previous chapter, the call will be as follows:

curl -XPUT 'http://localhost:9200/myindex/order/_mapping' -d '{
    "order" : {
        "properties" : {
            "id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},
            "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"},
            "customer_id"...

Getting a mapping

After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand the structure or its evolution due to some merge and explicit type guessing.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the previous recipe.

How to do it...

The HTTP method to get a mapping is GET.

The URL formats for getting mapping are as follows:

http://<server>/_mapping
http://<server>/<index_name>/_mapping
http://<server>/<index_name>/<type_name>/_mapping

For getting a mapping from in an index, we need to perform the following steps:

If we consider the type order of the previous chapter, the call will be as follows:
```
curl -XGET 'http://localhost:9200/myindex/order/_mapping?pretty=true'
```

The result returned by ElasticSearch should be as follows:

{
    "order": {
        "properties": {
            "customer_id": {
  ...

Deleting a mapping

The last CRUD (Create, Read, Update, Delete) operation related to mapping is the delete one.

Deleting a mapping is a destructive operation and must be done with caution to prevent losing your data.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the Putting a mapping in an index recipe.

How to do it...

The HTTP method to delete a mapping is DELETE.

The URL formats for getting the mapping are as follows:

http://<server>/<index_name>/<type_name>
http://<server>/<index_name>/<type_name>/_mapping

For deleting a mapping from in an index, we need to perform the following steps:

If we consider the type order of the previous chapter, the call will be as follows:
```
curl -XDELETE 'http://localhost:9200/myindex/order/'
```
If the call is successfully made, the result returned by ElasticSearch should be an HTTP 200 status code and a message similar to the following one:
```
{"ok":true}
```
If the mapping/type is missing, the following...

Refreshing an index

ElasticSearch allows the user to control the state of the searcher using forced refresh on an index. If not forced, the new indexed document will be only searchable after a fixed time interval (usually 1 second).

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used for both operations is POST.

The URL formats for refreshing an index is as follows:

http://<server>/<index_name(s)>/_refresh

The URL format for refreshing all the indices in a cluster is as follows:

http://<server>/_refresh

For refreshing an index, we need to perform the following steps:

If we consider the type order of the previous chapter, the call will be as follows:
```
curl -XPOST 'http://localhost:9200/myindex/_refresh
```

The result returned by ElasticSearch should be as follows:

{"ok":true,"_shards":{"total":4,"successful":2,"failed":0}}

Tip

The refresh call (as the flush and optimize ones) affects only the...

Flushing an index

ElasticSearch for performance reasons stores some data in memory and on a transaction log. If we want to free the memory, empty the translation log and be sure that our data is safely written on the disk we need to flush an index.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used for both operations is POST.

The URL format for flushing an index is as follows:

http://<server>/<index_name(s)>/_flush[?refresh=True]

The URL format for flushing all the indices in a cluster is as follows:

http://<server>/_flush[?refresh=True]

For flushing an index, we need to perform the following steps:

If we consider the type order of the previous chapter, the call will be as follows:
```
curl -XPOST 'http://localhost:9200/myindex/_flush?refresh=True'
```
If everything is all right, the result returned by ElasticSearch should be as follows:
```
{"ok":true,"_shards":{"total":4,"successful":2,"failed...
```

Optimizing an index

ElasticSearch core is based on Lucene, which stores the data in segments on the disk. During an index life, a lot of segments are created and changed. With the increase of segment number the speed of search decreases due to the time required to read all of them. The optimize operation allows to consolidate the index for faster search performance reducing segments.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used is POST.

The URL format for optimizing one or more indices, is as follows:

http://<server>/<index_name(s)>/_optimize

The URL format for optimizing all the indices in a cluster is as follows:

http://<server>/_optimize

For optimizing an index, we need to perform the following steps:

If we consider the index created in the Creating an index recipe, the call will be as follows:
```
curl -XPOST 'http://localhost:9200/myindex/_optimize'
```
The result returned by ElasticSearch...

Checking if an index or type exists

During the startup of an application, it's often necessary to check if an index or type exists otherwise we need to create them.

Getting ready

You need a working ElasticSearch cluster and the mapping available in the index as described in the previous recipes.

How to do it...

The HTTP method to check existence is HEAD. The URL format for checking an index is as follows:

http://<server>/<index_name>/

The URL format for checking a type is as follows:

http://<server>/<index_name>/<type>/

For checking if an index exists, we need to perform the following steps:

If we consider the index created in the Creating an index recipe, the call will be as follows:
```
curl –i -XHEAD 'http://localhost:9200/myindex/'
```
If the index exists an HTTP status code 200 is returned, if missing a 404 is returned. For checking if a type exists, we need to perform the following steps:
1. If we consider the mapping created in the Putting a mapping in an index recipe...

Managing index settings

Index settings are more important because they allow to control several important ElasticSearch functionalities such as sharding/replica, caching, term management, routing, and analysis.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe

How to do it...

For managing the index settings, we need to perform the following steps:

To retrieve the settings of your current index, the URL format is as follows:
```
http://<server>/<index_name>/_settings
```
We are reading information via the REST API, so the method will be GET and an example of call, using the index create in the Creating an index recipe, is as follows:
```
curl -XGET 'http://localhost:9200/myindex/_settings'
```

The response will be something similar to the following one:

{
    "myindex": {
        "settings": {
            "index.number_of_replicas": "1",
            "index.number_of_shards": "2",
            "index.version.created": "900199"
        }
    ...

Using index aliases

Real world applications have a lot of indices and queries that span on more indices. This scenario requires defining all the names of indices on which to we need to perform queries; aliases allow grouping them in a common name.

Some common scenarios of this usage are as follows:

Log indices divided by date (that is, log_YYMMDD) for which we want to create an alias for the last week, the last month, today, yesterday, and so on
"Collecting" website contents in several indices (New York Times, The Guardian, and so on) for those we want to refer as an index alias "sites"

Getting ready

You need a working ElasticSearch cluster.

How to do it...

The URL format for control aliases are as follows:

http://<server>/_aliases

For managing the index aliases, we need to perform the following steps:

We are reading aliases status via REST API, so the method will be GET and an example of call is as follows:
```
curl -XGET 'http://localhost:9200/_aliases'
```
You will get a response similar to the...

Indexing a document

In ElasticSearch there are two vital operations: index and search.

Index consists of putting one or more documents in an index: it is similar to the concept of inserting records in a relational database.

In Lucene, the core engine of ElasticSearch, inserting or updating a document has the same cost: in Lucene update means replace.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the Putting a mapping in an index recipe.

How to do it...

For indexing a document, several REST entry points that can be used are as follows:

Method	URL
POST	`http://<server>/<index_name>/<type>`
PUT/POST	`http://<server>/<index_name>/<type> /<id>`
PUT/POST	`http://<server>/<index_name>/<type> /<id>/_create`

For indexing a document, we need to perform the following steps:

If we consider the type order of the previous chapter, the call to index a document will be as follows:
```
curl -XPOST 'http://localhost...
```

Getting a document

After having indexed a document, during your application life it must probably be retrieved.

The GET REST call allows getting a document in real time without the need for a refresh operation.

Getting ready

You need a working ElasticSearch cluster and the indexed document from the Indexing a document recipe.

How to do it...

The GET method allows returning a document given its index, type, and ID.

The REST API URL is as follows:

http://<server>/<index_name>/<type_name>/<id>

For getting a document, we need to perform the following steps:

If we consider the document, which we had indexed in the previous recipe, the call will be as follows:
```
curl –XGET http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw?pretty=true
```

The result returned by ElasticSearch should be the indexed document and it should be as follows:

{
"_index":"myindex","_type":"order","_id":"2qLrAfPVQvCRMe7Ku8r0Tw","_version":1,"exists":true, "_source" : {
    "id" : "1234",
    "date" : "2013...

Deleting a document

Deleting documents in ElasticSearch is possible in two ways: using the delete call or the delete by query, which we'll see in the next chapter.

Getting ready

You need a working ElasticSearch cluster and the indexed document which we have discussed in the Indexing a document recipe.

How to do it...

The REST API URL is similar to that of GET calls, but the HTTP method is DELETE:

http://<server>/<index_name>/<type_name>/<id>

For deleting a document, we need to perform the following steps:

If we consider the order indexed in the Indexing a document recipe, the call to delete a document will be as follows:
```
curl -XDELETE 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw'
```

The result returned by ElasticSearch will be as follows:

{
    "_id": "2qLrAfPVQvCRMe7Ku8r0Tw",
    "_index": "myindex",
    "_type": "order",
    "_version": 2,
    "found": true,
    "ok": true
}

The result, a part of the well-known parameter starting from _ returns the ok status...

Updating a document

Documents stored in ElasticSearch can be updated during their lives. There are two available solutions to do this operation in ElasticSearch: repost the new document or use the update call.

The update call can work in the following two ways:

By providing a script which is the update strategy
By providing a document that must be merged with the original one

Getting ready

You need a working ElasticSearch cluster and the indexed document which we discussed in the Indexing a document recipe.

How to do it...

As we are changing the state of the data the HTTP method is POST and the following is the REST URL:

http://<server>/<index_name>/<type_name>/<id>/_update

For updating a document, we need to perform the following steps:

If we consider the type order of the previous recipe, the call to update a document will be as follows:

curl -XPOST 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_update' –d '{
"script" : "ctx._source.in_stock_items += count...

Speeding up atomic operations (bulk)

When we are inserting/deleting/updating a large number of documents, the HTTP overhead is significant to speed up the process, which ElasticSearch allows executing bulk of calls.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

As we are changing the state of the data the HTTP method is POST and the following is the REST URL:

http://<server>/<index_name/_bulk

For executing a bulk action, we need to perform the following steps:

We need to collect the create/index/delete/update commands in a structure made up of bulk JSON lines, composed by a line of action with metadata and another line optional of data related to the action. Every line must be ended with a newline character "\n".
A bulk datafile should be as follows:
```
{ "index":{ "_index":"myindex", "_type":"order", "_id":"1" } }
{ "field1" : "value1",  "field2" : "value2"  }
{ "delete":{ "_index":"myindex", "_type":"order", "_id":"2" } }
{ "create":{ "_index":"myindex", "_type...
```

Speeding up GET

The standard GET operation is very fast, but if you need to fetch a lot of IDs, ElasticSearch provides the multi get operation.

Getting ready

You need a working ElasticSearch cluster and the document index explained in the Indexing a document recipe.

How to do it...

The following are the multi GET REST URLs:

http://<server</_mget
http://<server>/<index_name>/_mget
http://<server>/<index_name>/<type_name>/_mget

For executing a multi GET action, we need to perform the following steps:

It is the GET method, but it requires a body with IDs and the index/type if they are missing.

The following is an example which uses the first URL, but we need to provide the index, type, and id parameters:

curl 'localhost:9200/_mget' -d '{
    "docs" : [
        {
            "_index" : "myindex",
            "_type" : "order",
            "_id" : "2qLrAfPVQvCRMe7Ku8r0Tw"
        },
        {
            "_index" : "myindex",
            "_type" : "order",
    ...

The rest of the chapter is locked

You have been reading a chapter from

ElasticSearch Cookbook

Published in: Dec 2013Publisher: PacktISBN-13: 9781782166627

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages