Reader small image

You're reading from  ElasticSearch Cookbook

Product typeBook
Published inDec 2013
Reading LevelBeginner
PublisherPackt
ISBN-139781782166627
Edition1st Edition
Languages
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 4. Standard Operations

In this chapter, we will cover the following topics:

  • Creating an index

  • Deleting an index

  • Opening/closing an index

  • Putting a mapping in an index

  • Getting a mapping

  • Deleting a mapping

  • Refreshing an index

  • Flushing an index

  • Optimizing an index

  • Checking if an index or type exists

  • Managing index settings

  • Using index aliases

  • Indexing a document

  • Getting a document

  • Deleting a document

  • Updating a document

  • Speeding up atomic operations (bulk)

  • Speeding up GET

Introduction


This chapter covers how to manage indices and operations on documents. We'll start by discussing different operations on indices, such as create, delete, update, open and close.

After indices, we'll see how to manage mappings to complete the discussion on them started in the previous chapter and to create a base for the next chapter mainly centered on search.

In this chapter, a lot of space is given to CRUD (Create-Read-Update-Delete) operations on records. For improved performance in indexing, it's also important to understand bulk operations and avoid their common pitfalls.

This chapter doesn't cover operations involving queries which are the main topics of the next chapter.

Also cluster operations will be discussed in the Monitor chapter because they are mainly related to control and monitor the cluster.

Creating an index


The first operation to do before starting indexing data in ElasticSearch is to create an index: the main container of our data.

An index is similar to the concept of a database in SQL.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

The HTTP method to create an index is PUT (but POST also works), the REST URL is the index name, which is written as follows:

http://<server>/<index_name>

For creating an index, we need to perform the following steps:

  1. From command line, we can execute a PUT call as follows:

    curl -XPUT http://127.0.0.1:9200/myindex -d '{
        "settings" : {
            "index" : {
                "number_of_shards" : 2,
                "number_of_replicas" : 1
            }
        }
    }'
    
  2. The result returned by ElasticSearch, if everything is all right, should be as follows:

    {"ok":true,"acknowledged":true}
  3. If the index already exists, a 400 error is returned:

    {"error":"IndexAlreadyExistsException[[myindex] Already exists]","status":400}

How it works...

There...

Deleting an index


The counterpart of creating an index is deleting one.

Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index. Some of them are as follows:

  • Removing it because the data that it contains is not needed anymore

  • Reset an index for a scratch restart

  • Delete an index that has some missing shard due to some failure to bring back the cluster in a valid state

Getting ready

You need a working ElasticSearch cluster and the existing index created in the previous recipe.

How to do it...

The HTTP method used to delete an index is DELETE.

The URL contains only the index name, which is as follows:

http://<server>/<index_name>

For deleting an index, we need to perform the following steps:

  1. From command line, we can execute a DELETE call as follows:

    curl -XDELETE http://127.0.0.1:9200/myindex
    
  2. The result returned by ElasticSearch, if everything is all right, should be as follows:

    {"ok":true,"acknowledged":true}
  3. If the...

Opening/closing an index


If you want to keep your data but save resources (memory/CPU), a good alternative to deleting an index is to close them.

ElasticSearch allows to open/close an index for putting it in the online/offline mode.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

For opening/closing an index, we need to perform the following steps:

  1. From command line, we can execute a POST call to close an index as follows:

    curl -XPOST http://127.0.0.1:9200/myindex/_close
    
  2. If the call is successfully made, the result returned by ElasticSearch should be as follows:

    {"ok":true,"acknowledged":true}
  3. To open an index from command line use the following command:

    curl -XPOST http://127.0.0.1:9200/myindex/_open
    
  4. If the call is successfully made, the result returned by ElasticSearch should be as follows:

    {"ok":true,"acknowledged":true}

How it works...

When an index is closed, there is no overhead on the cluster (except for metadata state...

Putting a mapping in an index


In the previous chapter, we saw how to build a mapping for our data. This recipe shows how to put a type in an index. This kind of operation can be considered as the ElasticSearch version of an SQL create table command.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method to put a mapping is PUT (POST also works).

The URL format for putting a mapping is as follows:

http://<server>/<index_name>/<type_name>/_mapping

For putting a mapping in an index, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be as follows:

    curl -XPUT 'http://localhost:9200/myindex/order/_mapping' -d '{
        "order" : {
            "properties" : {
                "id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},
                "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"},
                "customer_id"...

Getting a mapping


After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand the structure or its evolution due to some merge and explicit type guessing.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the previous recipe.

How to do it...

The HTTP method to get a mapping is GET.

The URL formats for getting mapping are as follows:

http://<server>/_mapping
http://<server>/<index_name>/_mapping
http://<server>/<index_name>/<type_name>/_mapping

For getting a mapping from in an index, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be as follows:

    curl -XGET 'http://localhost:9200/myindex/order/_mapping?pretty=true'
    
  2. The result returned by ElasticSearch should be as follows:

    {
        "order": {
            "properties": {
                "customer_id": {
      ...

Deleting a mapping


The last CRUD (Create, Read, Update, Delete) operation related to mapping is the delete one.

Deleting a mapping is a destructive operation and must be done with caution to prevent losing your data.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the Putting a mapping in an index recipe.

How to do it...

The HTTP method to delete a mapping is DELETE.

The URL formats for getting the mapping are as follows:

http://<server>/<index_name>/<type_name>
http://<server>/<index_name>/<type_name>/_mapping

For deleting a mapping from in an index, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be as follows:

    curl -XDELETE 'http://localhost:9200/myindex/order/'
    
  2. If the call is successfully made, the result returned by ElasticSearch should be an HTTP 200 status code and a message similar to the following one:

    {"ok":true}
  3. If the mapping/type is missing, the following...

Refreshing an index


ElasticSearch allows the user to control the state of the searcher using forced refresh on an index. If not forced, the new indexed document will be only searchable after a fixed time interval (usually 1 second).

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used for both operations is POST.

The URL formats for refreshing an index is as follows:

http://<server>/<index_name(s)>/_refresh

The URL format for refreshing all the indices in a cluster is as follows:

http://<server>/_refresh

For refreshing an index, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be as follows:

    curl -XPOST 'http://localhost:9200/myindex/_refresh
    
  2. The result returned by ElasticSearch should be as follows:

    {"ok":true,"_shards":{"total":4,"successful":2,"failed":0}}

Tip

The refresh call (as the flush and optimize ones) affects only the...

Flushing an index


ElasticSearch for performance reasons stores some data in memory and on a transaction log. If we want to free the memory, empty the translation log and be sure that our data is safely written on the disk we need to flush an index.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used for both operations is POST.

The URL format for flushing an index is as follows:

http://<server>/<index_name(s)>/_flush[?refresh=True] 

The URL format for flushing all the indices in a cluster is as follows:

http://<server>/_flush[?refresh=True] 

For flushing an index, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be as follows:

    curl -XPOST 'http://localhost:9200/myindex/_flush?refresh=True'
    
  2. If everything is all right, the result returned by ElasticSearch should be as follows:

    {"ok":true,"_shards":{"total":4,"successful":2,"failed...

Optimizing an index


ElasticSearch core is based on Lucene, which stores the data in segments on the disk. During an index life, a lot of segments are created and changed. With the increase of segment number the speed of search decreases due to the time required to read all of them. The optimize operation allows to consolidate the index for faster search performance reducing segments.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe.

How to do it...

The HTTP method used is POST.

The URL format for optimizing one or more indices, is as follows:

http://<server>/<index_name(s)>/_optimize

The URL format for optimizing all the indices in a cluster is as follows:

http://<server>/_optimize

For optimizing an index, we need to perform the following steps:

  1. If we consider the index created in the Creating an index recipe, the call will be as follows:

    curl -XPOST 'http://localhost:9200/myindex/_optimize'
    
  2. The result returned by ElasticSearch...

Checking if an index or type exists


During the startup of an application, it's often necessary to check if an index or type exists otherwise we need to create them.

Getting ready

You need a working ElasticSearch cluster and the mapping available in the index as described in the previous recipes.

How to do it...

The HTTP method to check existence is HEAD. The URL format for checking an index is as follows:

http://<server>/<index_name>/

The URL format for checking a type is as follows:

http://<server>/<index_name>/<type>/

For checking if an index exists, we need to perform the following steps:

  1. If we consider the index created in the Creating an index recipe, the call will be as follows:

    curl –i -XHEAD 'http://localhost:9200/myindex/'
    
  2. If the index exists an HTTP status code 200 is returned, if missing a 404 is returned. For checking if a type exists, we need to perform the following steps:

    1. If we consider the mapping created in the Putting a mapping in an index recipe...

Managing index settings


Index settings are more important because they allow to control several important ElasticSearch functionalities such as sharding/replica, caching, term management, routing, and analysis.

Getting ready

You need a working ElasticSearch cluster and the index created in the Creating an index recipe

How to do it...

For managing the index settings, we need to perform the following steps:

  1. To retrieve the settings of your current index, the URL format is as follows:

    http://<server>/<index_name>/_settings
    
  2. We are reading information via the REST API, so the method will be GET and an example of call, using the index create in the Creating an index recipe, is as follows:

    curl -XGET 'http://localhost:9200/myindex/_settings'
    
  3. The response will be something similar to the following one:

    {
        "myindex": {
            "settings": {
                "index.number_of_replicas": "1",
                "index.number_of_shards": "2",
                "index.version.created": "900199"
            }
        ...

Using index aliases


Real world applications have a lot of indices and queries that span on more indices. This scenario requires defining all the names of indices on which to we need to perform queries; aliases allow grouping them in a common name.

Some common scenarios of this usage are as follows:

  • Log indices divided by date (that is, log_YYMMDD) for which we want to create an alias for the last week, the last month, today, yesterday, and so on

  • "Collecting" website contents in several indices (New York Times, The Guardian, and so on) for those we want to refer as an index alias "sites"

Getting ready

You need a working ElasticSearch cluster.

How to do it...

The URL format for control aliases are as follows:

http://<server>/_aliases

For managing the index aliases, we need to perform the following steps:

  1. We are reading aliases status via REST API, so the method will be GET and an example of call is as follows:

    curl -XGET 'http://localhost:9200/_aliases'
    
  2. You will get a response similar to the...

Indexing a document


In ElasticSearch there are two vital operations: index and search.

Index consists of putting one or more documents in an index: it is similar to the concept of inserting records in a relational database.

In Lucene, the core engine of ElasticSearch, inserting or updating a document has the same cost: in Lucene update means replace.

Getting ready

You need a working ElasticSearch cluster and the mapping created in the Putting a mapping in an index recipe.

How to do it...

For indexing a document, several REST entry points that can be used are as follows:

Method

URL

POST

http://<server>/<index_name>/<type>

PUT/POST

http://<server>/<index_name>/<type> /<id>

PUT/POST

http://<server>/<index_name>/<type> /<id>/_create

For indexing a document, we need to perform the following steps:

  1. If we consider the type order of the previous chapter, the call to index a document will be as follows:

    curl -XPOST 'http://localhost...

Getting a document


After having indexed a document, during your application life it must probably be retrieved.

The GET REST call allows getting a document in real time without the need for a refresh operation.

Getting ready

You need a working ElasticSearch cluster and the indexed document from the Indexing a document recipe.

How to do it...

The GET method allows returning a document given its index, type, and ID.

The REST API URL is as follows:

http://<server>/<index_name>/<type_name>/<id>

For getting a document, we need to perform the following steps:

  1. If we consider the document, which we had indexed in the previous recipe, the call will be as follows:

    curl –XGET http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw?pretty=true
    
  2. The result returned by ElasticSearch should be the indexed document and it should be as follows:

    {
    "_index":"myindex","_type":"order","_id":"2qLrAfPVQvCRMe7Ku8r0Tw","_version":1,"exists":true, "_source" : {
        "id" : "1234",
        "date" : "2013...

Deleting a document


Deleting documents in ElasticSearch is possible in two ways: using the delete call or the delete by query, which we'll see in the next chapter.

Getting ready

You need a working ElasticSearch cluster and the indexed document which we have discussed in the Indexing a document recipe.

How to do it...

The REST API URL is similar to that of GET calls, but the HTTP method is DELETE:

http://<server>/<index_name>/<type_name>/<id>

For deleting a document, we need to perform the following steps:

  1. If we consider the order indexed in the Indexing a document recipe, the call to delete a document will be as follows:

    curl -XDELETE 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw'
    
  2. The result returned by ElasticSearch will be as follows:

    {
        "_id": "2qLrAfPVQvCRMe7Ku8r0Tw",
        "_index": "myindex",
        "_type": "order",
        "_version": 2,
        "found": true,
        "ok": true
    }

    The result, a part of the well-known parameter starting from _ returns the ok status...

Updating a document


Documents stored in ElasticSearch can be updated during their lives. There are two available solutions to do this operation in ElasticSearch: repost the new document or use the update call.

The update call can work in the following two ways:

  • By providing a script which is the update strategy

  • By providing a document that must be merged with the original one

Getting ready

You need a working ElasticSearch cluster and the indexed document which we discussed in the Indexing a document recipe.

How to do it...

As we are changing the state of the data the HTTP method is POST and the following is the REST URL:

http://<server>/<index_name>/<type_name>/<id>/_update

For updating a document, we need to perform the following steps:

  1. If we consider the type order of the previous recipe, the call to update a document will be as follows:

    curl -XPOST 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_update' –d '{
    "script" : "ctx._source.in_stock_items += count...

Speeding up atomic operations (bulk)


When we are inserting/deleting/updating a large number of documents, the HTTP overhead is significant to speed up the process, which ElasticSearch allows executing bulk of calls.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

As we are changing the state of the data the HTTP method is POST and the following is the REST URL:

http://<server>/<index_name/_bulk

For executing a bulk action, we need to perform the following steps:

  1. We need to collect the create/index/delete/update commands in a structure made up of bulk JSON lines, composed by a line of action with metadata and another line optional of data related to the action. Every line must be ended with a newline character "\n".

    A bulk datafile should be as follows:

    { "index":{ "_index":"myindex", "_type":"order", "_id":"1" } }
    { "field1" : "value1",  "field2" : "value2"  }
    { "delete":{ "_index":"myindex", "_type":"order", "_id":"2" } }
    { "create":{ "_index":"myindex", "_type...

Speeding up GET


The standard GET operation is very fast, but if you need to fetch a lot of IDs, ElasticSearch provides the multi get operation.

Getting ready

You need a working ElasticSearch cluster and the document index explained in the Indexing a document recipe.

How to do it...

The following are the multi GET REST URLs:

  • http://<server</_mget

  • http://<server>/<index_name>/_mget

  • http://<server>/<index_name>/<type_name>/_mget

For executing a multi GET action, we need to perform the following steps:

  1. It is the GET method, but it requires a body with IDs and the index/type if they are missing.

    The following is an example which uses the first URL, but we need to provide the index, type, and id parameters:

    curl 'localhost:9200/_mget' -d '{
        "docs" : [
            {
                "_index" : "myindex",
                "_type" : "order",
                "_id" : "2qLrAfPVQvCRMe7Ku8r0Tw"
            },
            {
                "_index" : "myindex",
                "_type" : "order",
        ...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
ElasticSearch Cookbook
Published in: Dec 2013Publisher: PacktISBN-13: 9781782166627
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro