Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Monitoring Elasticsearch
Monitoring Elasticsearch

Monitoring Elasticsearch:

By Dan Noble , Pulkit Agrawal , Mahmoud Lababidi
$38.99
Book Jul 2016 180 pages 1st Edition
eBook
$29.99 $20.98
Print
$38.99
Subscription
$15.99 Monthly
eBook
$29.99 $20.98
Print
$38.99
Subscription
$15.99 Monthly

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 27, 2016
Length 180 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784397807
Category :
Table of content icon View table of contents Preview book icon Preview Book

Monitoring Elasticsearch

Chapter 1. Introduction to Monitoring Elasticsearch

Elasticsearch is a distributed and horizontally scalable full-text search engine with built-in data redundancy. It is a powerful and incredibly useful tool. However, as with any distributed system, problems may arise as it scales with more nodes and more data.

The information provided by Elasticsearch monitoring tools can drastically improve your ability to solve cluster issues and greatly increase cluster reliability and performance as a result. This chapter gives an overview of Elasticsearch and talks about why and how to monitor a cluster.

Specifically, this chapter covers the following topics:

  • An overview of Elasticsearch

  • Monitoring Elasticsearch

  • Resourcefulness and problem solving

An overview of Elasticsearch


This section gives a high-level overview of Elasticsearch and discusses some related full-text search products.

Learning more about Elasticsearch

Elasticsearch is a free and open source full-text search engine that is built on top of Apache Lucene. Out of the box, Elasticsearch supports horizontal scaling and data redundancy. Released in 2010, Elasticsearch quickly gained recognition in the full-text search space. Its scalability features helped the tool gain market share against similar technologies such as Apache Solr.

Elasticsearch is a persistent document store and retrieval system, and it is similar to a database. However, it is different from relational databases such as MySQL, PostgreSQL, and Oracle in many ways:

  • Distributed: Elasticsearch stores data and executes queries across multiple data nodes. This improves scalability, reliability, and performance.

  • Fault tolerant: Data is replicated across multiple nodes in an Elasticsearch cluster, so if one node goes down, data is still available.

  • Full-text search: Elasticsearch is built on top of Lucene, a full-text search technology, allowing it to understand and search natural language text.

  • JSON document store: Elasticsearch stores documents as JSON instead of as rows in a table.

  • NoSQL: Elasticsearch uses a JSON-based query language as opposed to a sequel query language (SQL).

  • Non-relational: Unlike relational databases, Elasticsearch doesn't support JOINS across tables.

  • Analytics: Elasticsearch has built-in analytical capabilities, such as word aggregations, geospatial queries, and scripting language support.

  • Dynamic Mappings: A mapping in Elasticsearch is analogous to a schema in the relational database world. If the data type for a document field isn't explicitly defined, Elasticsearch will dynamically assign a type to it.

Data distribution, redundancy, and fault tolerance

Figures 1.1 through 1.4 explain how Elasticsearch distributes data across multiple nodes and how it automatically recovers from node failures:

Figure 1.1: Elasticsearch Data Distribution

In this figure, we have an Elasticsearch cluster made up of three nodes: elasticsearch-node-01, elasticsearch-node-02, and elasticsearch-node-03. Our data index, is broken into three pieces, called shards. These shards are labeled 0, 1, and 2. Each shard is replicated once; this means that there is a redundant copy of all shards. The cluster is colored green because the cluster is in good health; all data shards and replicas are available.

Let's say that the elasticsearch-node-03 host experiences a hardware failure and shuts down. The following figures show what happens to the cluster in this scenario:

Figure 1.2: Node failure

Figure 1.2 shows elasticsearch-node-03 experiencing a failure, and the cluster entering a yellow state. This state means that there is at least one copy of each shard active in the cluster, but not all shard replicas are active. In our case, a copy of the 1 and 2 shards were on the node that failed, elasticsearch-node-03. A yellow state also warns us that if there's another hardware failure, it's possible that not all data shards will be available.

When elasticsearch-node-03 goes down, Elasticsearch will automatically start rebuilding redundant copies of the 1 and 2 shards on the remaining nodes; in our case, this is elasticsearch-node-01 and elasticsearch-node-02. This is shown in the following figure:

Figure 1.3: Cluster recovering

Once Elasticsearch finishes rebuilding the data replicas, the cluster enters a green state once again. Now, all data and shards are available to query.

Figure 1.4: Cluster recovered

The cluster recovery process demonstrated in Figures 1.3 and 1.4 happens automatically in Elasticsearch. No extra configuration or user action is required.

Full-text search

Full-text search refers to running keyword queries against natural-language text documents. A document can be something, such as a newspaper article, a blog post, a forum post, or a tweet. In fact, many popular newspapers, forums, and social media websites, such as The New York Times, Stack Overflow, and Foursquare, use Elasticsearch.

Assume that we were to store the following text string in Elasticsearch:

We demand rigidly defined areas of doubt and uncertainty!

A user can find this document by searching Elasticsearch using keywords, such as demand or doubt. Elasticsearch also supports word stemming. This means that if we searched for the word define, Elasticsearch would still find this document because the root word of defined is define.

This piece of text, along with some additional metadata, may be stored as follows in Elasticsearch in the JSON format:

{
    "text" : "We demand rigidly defined areas of doubt and uncertainty!",
    "author" : "Douglas Adams",
    "published" : "1979-10-12",
    "likes" : 583,
    "source" : "The Hitchhiker's Guide to the Galaxy",
    "tags" : ["science fiction", "satire"]
}

If we let Elasticsearch dynamically assign a mapping (think schema) to this document, it would look like this:

{
    "quote" : {
        "properties" : {
            "author" : {
                "type" : "string"
            },
            "likes" : {
                "type" : "long"
            },
            "published" : {
                "type" : "date",
                "format" : "strict_date_optional_time||epoch_millis"
            },
            "source" : {
                "type" : "string"
            },
            "tags" : {
                "type" : "string"
            },
            "text" : {
                "type" : "string"
            }
        }
    }
}

Note that Elasticsearch was able to pick up that the published field looked like a date.

An Elasticsearch query that searches for this document looks like this:

{
    "query" : {
        "query_string" : {
            "query" : "demand rigidly"
        }
    },
    "size" : 10
}

Specifics about Elasticsearch mappings and the Search API are beyond the scope of this book, but you can learn more about them through the official Elasticsearch documentation at the following links:

Note

Elasticsearch should not be your primary data store. It does not provide guarantees, such as the Atomicity, Consistency, Isolation, and Durability (ACID) of a traditional SQL data store, nor the reliability guarantees of other NoSQL databases such as HBase or Cassandra. Even though Elasticsearch has built-in data redundancy and fault tolerance, it's best practice to archive your data in a separate data store in order to re-index data into Elasticsearch if needed.

Similar technologies

This section explains a few of the many open source full-text search engines available, and discusses how they match up to Elasticsearch.

Apache Lucene

Apache Lucene (https://lucene.apache.org/core/) is an open source full-text search Java library. As mentioned earlier, Lucene is Elasticsearch's underlying search technology. Lucene also provides Elasticsearch's analytics features such as text aggregations and geospatial search. Using Apache Lucene directly is a good choice if you perform full-text search in Java on a small scale, or are building your own full-text search engine.

The benefits of using Elasticsearch over Lucene are as follows:

  • REST API instead of a Java API

  • JSON document store

  • Horizontal scalability, reliability, and fault tolerance

On the other hand, Lucene is much more lightweight and flexible to build custom applications that require full-text search integrated from the ground up.

Note

Lucene.NET is a popular .NET port of the library written in C#

Solr

Solr is another full-text search engine built on top of Apache Lucene. It has similar search, analytic, and scaling capabilities to Elasticsearch. For most applications that need a full-text search engine, choosing between Solr and Elasticsearch comes down to personal preference.

Ferret

Ferret is a full-text search engine for Ruby. It's similar to Lucene, but it is not as feature-rich. It's generally better used for Ruby applications that don't require the power (or complexity) of a search engine, such as Elasticsearch or Solr.

Monitoring Elasticsearch


Monitoring distributed systems is difficult because as the number of nodes, the number of users, and the amount of data increase, problems will begin to crop up.

Furthermore, it may not be immediately obvious if there is an error. Often, the cluster will keep running and try to recover from the error automatically. As shown in Figures 1.2, 1.3, and 1.4 earlier, a node failed, but Elasticsearch brought itself back to a green state without any action on our part. Unless monitored, failures such as these can go unnoticed. This can have a detrimental impact on system performance and reliability. Fewer nodes means less processing power to respond to queries, and, as in the previous example, if another node fails, our cluster won't be able to return to a green state.

The aspects of an Elasticsearch cluster that we'll want to keep track of include the following:

  • Cluster health and data availability

  • Node failures

  • Elasticsearch JVM memory usage

  • Elasticsearch cache size

  • System utilization (CPU, Memory, and Disk)

  • Query response times

  • Query rate

  • Data index times

  • Data index rate

  • Number of indices and shards

  • Index and shard size

  • System configuration

In this book, we'll go over how to understand each of these variables in context and how understanding them can help diagnose, recover from, and prevent problems in our cluster. It's certainly not possible to preemptively stop all Elasticsearch errors. However, by proactively monitoring our cluster, we'll have a good idea of when things are awry and will be better positioned to take corrective action.

In the following chapters, we'll go over everything from web-based cluster monitoring tools to Unix command line tools and log file monitoring. Some of the specific tools this book covers are as follows:

  • Elasticsearch-head

  • Bigdesk

  • Marvel

  • Kopf

  • Kibana

  • Nagios

  • Unix command-line tools

These tools will give us the information we need to effectively diagnose, solve, and prevent problems with Elasticsearch.

Resourcefulness and problem solving


Monitoring tools do a great job of telling you what is going on in your cluster, and they can often point out if there is a problem. However, these tools won't give you a recipe for how to actually fix a problem. Resolving issues takes critical thinking, attention to detail, and persistence. Some of the problem-solving themes this book talks about are as follows:

  • Always try to recreate the problem

  • Be on the lookout for configuration and user errors

  • Only make one configuration change at a time before testing

This book also provides some real-world case studies that help you turn the information provided by monitoring tools into insights to resolve Elasticsearch issues.

Summary


This chapter gave you an overview of Elasticsearch and why it's important to proactively monitor a cluster. To summarize the points from the chapter:

  • Elasticsearch is an open source scalable, fast, and fault-tolerant search engine

  • Elasticsearch is built on top of Apache Lucene, the same library that powers Apache Solr

  • Monitoring tools will help us get a better understanding of our cluster and will let us know when problems arise

  • As helpful as monitoring tools are, it's up to us to actually diagnose and fix cluster issues

In the next chapter, we'll cover how to get a simple Elasticsearch cluster running and loaded with data, and how to install several monitoring tools.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • •Understand common performance and reliability pitfalls in ElasticSearch
  • •Use popular monitoring tools such as ElasticSearch-head, BigDesk, Marvel, Kibana, and more
  • •This is a step-by-step guide with lots of case studies on solving real-world ElasticSearch cluster issues

Description

ElasticSearch is a distributed search server similar to Apache Solr with a focus on large datasets, a schema-less setup, and high availability. This schema-free architecture allows ElasticSearch to index and search unstructured content, making it perfectly suited for both small projects and large big data warehouses with petabytes of unstructured data. This book is your toolkit to teach you how to keep your cluster in good health, and show you how to diagnose and treat unexpected issues along the way. You will start by getting introduced to ElasticSearch, and look at some common performance issues that pop up when using the system. You will then see how to install and configure ElasticSearch and the ElasticSearch monitoring plugins. Then, you will proceed to install and use the Marvel dashboard to monitor ElasticSearch. You will find out how to troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch. Finally, you will analyze your cluster’s historical performance, and get to know how to get to the bottom of and recover from system failures. This book will guide you through several monitoring tools, and utilizes real-world cases and dilemmas faced when using ElasticSearch, showing you how to solve them simply, quickly, and cleanly.

What you will learn

•Explore your cluster with ElasticSearch-head and BigDesk •Access the underlying data of the ElasticSearch monitoring plugins using the ElasticSearch API •Analyze your cluster’s performance with Marvel •Troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch •Analyze a cluster’s historical performance, and get to the bottom of and recover from system failures •Use and install various other tools and plugins such as Kibana and Kopf, which is helpful to monitor ElasticSearch

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 27, 2016
Length 180 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784397807
Category :

Table of Contents

15 Chapters
Monitoring Elasticsearch Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Introduction to Monitoring Elasticsearch Chevron down icon Chevron up icon
Installation and the Requirements for Elasticsearch Chevron down icon Chevron up icon
Elasticsearch-head and Bigdesk Chevron down icon Chevron up icon
Marvel Dashboard Chevron down icon Chevron up icon
System Monitoring Chevron down icon Chevron up icon
Troubleshooting Performance and Reliability Issues Chevron down icon Chevron up icon
Node Failure and Post-Mortem Analysis Chevron down icon Chevron up icon
Looking Forward Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela