Reader small image

You're reading from  Elasticsearch 5.x Cookbook - Third Edition

Product typeBook
Published inFeb 2017
Publisher
ISBN-139781786465580
Edition3rd Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 10. Managing Clusters and Nodes

In this chapter, we will cover the following recipes:

  • Controlling cluster health via API

  • Controlling cluster state via API

  • Getting cluster nodes information via API

  • Getting node statistics via API

  • Using the task management API

  • Hot thread API

  • Managing the shard allocation

  • Monitoring segments with segment API

  • Cleaning the cache

Introduction


In the Elasticsearch ecosystem, it's important to monitor nodes and clusters to manage and improve their performance and state. There are several issues that can arise at cluster level, such as:

  • Node overheads: Some nodes can have too many shards allocated and become a bottleneck for the entire cluster

  • Node shutdown: This can happen for many reasons, for example, full disks, hardware failures, and power problems

  • Shard relocation problems or corruptions: Some shards cannot get an online status

  • Too large shards: If a shard is too big, the index performance decreases due to massive Lucene segments merging

  • Empty indices and shards: They waste memory and resources, but because every shard has a lot of active thread, if there are a huge number of unused indices and shards, the general cluster performance is degraded

Detecting malfunctioning or poor performance can be done via an API or through some frontends, as we will see in Chapter 12, User Interfaces. These allow the readers to have...

Controlling cluster health via an API


In the Understanding cluster, replication and sharding recipe in Chapter 1, Getting Started, we discussed the Elasticsearch clusters and how to manage them in a red and yellow state.

Elasticsearch provides a convenient way to manage the cluster state, which is one of the first things to check if any problems occur.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via command line you need to install curl for your operating system.

How to do it...

For controlling the cluster health, we will perform the following steps:

  1. To view the cluster health, the HTTP method is GET and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_cluster/health?pretty'
  2. The result will be as follows:

            { 
              "cluster_name" : "elasticsearch", 
              "status" : "yellow", 
              "timed_out" : false, 
      ...

Controlling cluster state via an API


The previous recipe returns information only about the health of the cluster. If you need more details on your cluster, you need to query its state.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

To check the cluster state, we will perform the following steps:

  1. To view the cluster state, the HTTP method is GET, and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_cluster/state' 
    
  2. The result will contain the following data sections:

    General cluster information:

            {
             "cluster_name" : "es-cookbook",
             "version" : 13,
             "state_uuid" : "QANXXnzhS7aS5HxLlyNKsw",
             "master_node" : "7NwnFF1JTPOPhOYuP1AVNQ",
             "blocks" : { },

    Node address information:

            "nodes" : {
       ...

Getting nodes information via API


The previous recipes allow information to be reutrned to the cluster level; Elasticsearch provides calls to gather information at node level. In production clusters, it's very important to monitor nodes via this API to detect misconfiguration and problems relating to different plugins and modules.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the commandline, you need to install curl for your operating system.

How to do it...

For getting nodes information, we will perform the following steps:

  1. To retrieve the node information, the HTTP method is GET and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_nodes' 
            curl -XGET 'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>' 
    
  2. The result will contain a lot of information about the node. It's huge, so the repetitive parts have...

Getting node statistics via the API


The node statistics call API is used to collect real-time metrics of your node, such as memory usage, threads usage, number of indexes, search and so on.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting nodes statistics, we will perform the following steps:

  1. To retrieve the node statistic, the HTTP method is GET, and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_nodes/stats'curl -XGET   
            'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>/stats'
  2. The result will be a long list of all the node statistics. The most significant parts of the results are as follows:

    A header describing the cluster name and the nodes section:

            { 
              "cluster_name" : "es-cookbook", 
       ...

Using the task management API


Elasicsearch 5.x allows the definition of actions that can take some time to complete. The most common ones are as follows:

  • delete_by_query

  • update_by_query

  • reindex

When these actions are called, they create a server side task that executes the job. The task management API allows you to control these actions.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting tasks information, we will perform the following steps:

  1. To retrieve the node information, the HTTP method is GET and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_tasks'curl -XGET    
            'http://localhost:9200/_tasks?nodes=<nodeId1, nodeId2>'curl - 
            XGET 'http://localhost:9200/_tasks?nodes=<nodeId1,   
            nodeId2>&...

Hot thread API


Sometimes your cluster slows down due to massive CPU usage and you need to understand why.

Elasticsearch provides the ability to monitor hot threads to be able to understand where the problem is.

Note

In Java, hot threads are threads that are using a lot of CPU and take a long time to execute.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting task information, we will perform the following steps:

  1. To retrieve the node information, the HTTP method is GET and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_nodes/hot_threads'curl -XGET 
            'http://localhost:9200/_nodes/{nodesIds}/hot_threads'
  2. The result will be something similar to the preceding one:

            ::: {7NwnFF1}{7NwnFF1JTPOPhOYuP1AVNQ}{OL2uVn3BQ-qMAg32eq_ouQ...

Managing the shard allocation


During normal Elasticsearch usage, it is not necessary to change the shard allocation, because the default settings work very well with all standard scenarios. Sometimes, due to massive relocation, or due to nodes restarting, or some other cluster issues, it's necessary to monitor or define custom shard allocation.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operating system.

How to do it...

For getting information about the current state of unassigned shard allocation, we will perform the following steps:

  1. To retrieve the cluster allocation information, the HTTP method is GET and the curl command is as follows:

            curl -XGET 'http://localhost:9200/_cluster/allocation/explain?
            pretty'
  2. The result will be something similar to the preceding one:

            { 
    ...

Monitoring segments with the segment API


Monitoring the index segments means monitoring the health of an index. It contains information about the number of segments and data stored in them.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operating system.

How to do it...

For getting information about index segments, we will perform the following steps:

  1. To retrieve the index segments, the HTTP method is GET and the curl command is as follows:

        curl -XGET 'http://localhost:9200/test-index/_segments'
  1. The result will be something similar to the preceding one:

        { 
          "_shards" : { ...truncated... }, 
          "indices" : { 
           "test-index" : { 
             "shards" : { 
                "0" : [ 
                  { 
                    "routing" : { 
                     "state...

Cleaning the cache


During its execution, Elasticsearch caches data to speed up searching, such as cache results, items and filter results.

To free up memory, it's necessary to clean cache API.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For cleaning the cache, we will perform the following steps:

  1. We call the cleancache API on an index as follows:

            curl -XPOST 'http://localhost:9200/test-index/_cache/clear'
  2. The result returned by Elasticsearch, if everything is okay, should be as follows:

        { 
          "_shards" : { 
            "total" : 10, 
            "successful" : 5, 
            "failed" : 0 
          } 
        } 

How it works...

The cache clean API frees the memory used to cache values in Elasticsearch.

Generally, it's not a good idea to clean...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 5.x Cookbook - Third Edition
Published in: Feb 2017Publisher: ISBN-13: 9781786465580
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro