Reader small image

You're reading from  Mastering Elastic Stack

Product typeBook
Published inFeb 2017
PublisherPackt
ISBN-139781786460011
Edition1st Edition
Right arrow
Authors (2):
Ravi Kumar Gupta
Ravi Kumar Gupta
author image
Ravi Kumar Gupta

Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development. He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools. He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in). He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing. He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Read more about Ravi Kumar Gupta

Yuvraj Gupta
Yuvraj Gupta
author image
Yuvraj Gupta

Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at gupta.yuvraj@gmail.com or at LinkedIn www.linkedin.com/in/guptayuvraj.
Read more about Yuvraj Gupta

View More author details
Right arrow

Chapter 11. Best Practices

In the previous chapter, we discussed the various components provided by X-Pack as part of Elastic Stack. We explored each of the components in detail covering what the component offers and the various functionalities provided by the component.

If you have been following the chapters, then you probably have a lot of questions in your mind on how should I start or proceed to create a scalable system, and what should be the best practices that should be adhered upon for creating an efficient system.

At the end of this chapter, you will understand some of the best practices that can be used in Elastic Stack, to make it production-ready after learning from other people's experiences.

In this chapter, we will cover the following topics:

  • Why do we require best practices?

  • Understanding your use case

  • Choosing the right set of hardware

  • Searching and indexing performance

  • Sizing the Elasticsearch cluster

  • Logstash configuration files

  • Re-indexing data

Why do we require best practices?


When we start to learn a tool, we always try to use the tool on a standalone machine from which we gain expertise and can experiment various things. As and when we take the small setup to a big setup ranging multiple systems, there can be a lot of things that can hinder the performance or could not provide the optimum results that we get on a small setup. In order to achieve the best results, we require understanding and following some of the practices that have been followed by others leading to a higher performance and less troubles. In terms of programming languages, we have the concept of best coding practices, which describes how to write a code that is easy to understand, maintain, and read, which reduces the effort if someone else starts working on the same or similar piece of code. But in tools or components there is no hard and fast rule of best practices. Best practices will depend on various factors such as the architecture designed, components...

Understanding your use case


This is the basic level of information that everyone should know before even thinking about the best practices or using Elastic Stack. If you learn about Elastic Stack and without thinking about your use case, you start to connect or process random data, then you will not be able to deduce proper information as required. You will always remain stranded when asked why you chose this or why not another software or tool. Hence, it becomes immensely important to understand your use case beforehand.

Your use-case primarily answers a lot of questions regarding the components to be used, the criticality of logging data flowing in, requirement of high availability clusters, and a need for centralized logging system. If your use-case comprises of analyzing the logs of an application, then you can make a decision accordingly about whether you need a single elasticsearch cluster, or if you need to create a centralized logging system.

One of the biggest questions that arises...

Managing configuration files


The configuration file changes are the most basic yet important to make from a production deployment perspective. Let's have a look at the configuration files of various components.

Elasticsearch - elasticsearch.yml

By default, Elasticsearch sets values for important properties such as cluster, node related like cluster name, node name, and so on. While it's not necessary to set, it's a good idea to customize the names. For example, we should specify the node names so that we can remember and keep track of the node statistics by the node name that we specified. Few of such properties are explained below:

  • Change the name of the cluster by modifying the following property:

          # cluster.name: my-application  
          cluster.name: production-elasticstack 
    
  • Change the name of the node to easily identify the nodes joining in the cluster by modifying the following property:

          # node.name: node-1 
          node.name: elasticstack1 
    
  • Change the location...

Choosing the right set of hardware


The important metric to understand while installing Elastic Stack, is to know how much memory will be consumed, disk space required, maintaining proper I/O requests, how much CPU or cores are required for resource consumption, and the network among the systems. Whenever we install Elastic Stack there is always confusion on the amount of resources used by each of the component. Let us try to break the resources requirement based on the various components.

Memory

It is one of the most important parameters that affect the performance of an application. If the memory provided is lower than expected, then the application can stop, fail, or show an Out of Memory error.

Memory sizing is important as memory is used by the OS for various tasks. Hence, determining how much memory to provide to Elastic Stack components is essential such that application and OS performance do not get affected.

In the context of Elastic Stack, memory is crucial to decide if you should configure...

Searching and indexing performance


So far we have uncovered some of the best practices in terms of resources. As memory, CPU, I/O, disks, and network play a big part in choosing the preferred set of system configurations; we can tweak a few settings to improve resources usage for searching and indexing in Elasticsearch and Lucene.

Filter cache

By default, the filters used in Elasticsearch for querying are cached, which means when the query uses filter, Elasticsearch finds the documents related to the filter and stores the filter used as cache. After caching, if any query with the same filters are used it will provide quicker results as filters have been cached to memory. As internally it uses memory, it is wise to set a property to limit the usage of the Filter cache. Though each filter uses less memory, JVM heap size can take a hit if a large number of filters are used. By using the following property, we can limit the amount of Heap memory that can be used for the filter cache:

indices.cache...

Sizing the Elasticsearch cluster


It is important to understand how we can size the Elasticsearch cluster efficiently by choosing the right kind of node, determining the number of nodes in the cluster, determining the number of shards and replicas to use, and determining the number of indices to store. There are no fixed rules to follow in order to size the Elasticsearch cluster.

Choosing the right kind of node

In Elasticsearch, we have always dealt with nodes, but somewhere no clear distinction has been made on the different types of node that are available. Let's understand the different type of nodes that can be created in the Elasticsearch cluster.

Master and data node

This is the default node that is created in the Elasticsearch cluster whenever an Elasticsearch instance is started. This type of node acts as a master node as well as stores the data. If this node is not a master node and another node fails, then this node will be available to become the master node. It performs operations...

Logstash configuration file


Configuration file is the key to run Logstash. As Logstash requires the configuration file to be created/updated it is important to have an efficient and flexible configuration that can be easily changed as and when required. Let's have a look at some of the best practices that we should follow.

Categorizing multiple sources of data

When you have multiple different sources of data that you want to gather and uncover insights from them, it is best to categorize each source of data by adding a type to each of the different sources.

We can take a look at the following example:

input { 
   file { 
      path => "/path/to/directory/" 
      type => "datanode" 
   } 
    file { 
      path => "/path/to/directory/" 
      type => "hbase" 
} 
    file { 
      path => "/path/to/directory/" 
      type => "yarn" 
   } 
} 

When you add a type, you can use different filters and output based...

Re-indexing data


Re-indexing data in Elasticsearch is a challenge when you have changed the schema or mappings of the fields. Upon changing the schema, you are either required to re-index all the documents of that field to incorporate mapping changes to previous documents stored, or to not re-index older documents, which will become useless with the change in schema.

Process of re-indexing data:

  1. Create a new index with the new mappings and settings.

  2. Take the documents from the old index and index it in a new index.

To minimize the effect and downtime of changing the schema and re-indexing, use the following approach.

Using aliases

Aliases are a powerful feature that can easily re-index the complete index data without any downtime. Alias can be considered as a nickname given to the index name. It can be considered as a symbolic link.

Let us see how to use aliases:

  • Create an index with its mapping and settings

  • Create an alias to point to the index name

After updating the schema/mappings:

  • Create a new...

Summary


Best practices should be followed as they will eliminate the various problems faced while setting, configuring, and using Elastic Stack. There are several settings that can be tuned as per requirement making the stack more stable.

Throughout this chapter, we encountered a number of ways to configure and use Elastic Stack. While this chapter tried to cover most of the important points to note, there can also be other settings that may turn out to be a best practice for specific requirements. The configurations and settings must be analyzed closely to avoid any loop holes. Remember that one poor setting may lead to a disaster.

In the next chapter, we will have a look at the case studies to explore how Elastic Stack can be utilized to meet end objectives.[footnote]

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Elastic Stack
Published in: Feb 2017Publisher: PacktISBN-13: 9781786460011
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ravi Kumar Gupta

Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development. He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools. He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in). He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing. He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Read more about Ravi Kumar Gupta

author image
Yuvraj Gupta

Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at gupta.yuvraj@gmail.com or at LinkedIn www.linkedin.com/in/guptayuvraj.
Read more about Yuvraj Gupta