Packt+ | Advance your knowledge in tech

You're reading from Mastering Elastic Stack

Product typeBook

Published inFeb 2017

PublisherPackt

ISBN-139781786460011

Edition1st Edition

Tools

Elasticsearch

Concepts

Enterprise Search

Authors (2):

Ravi Kumar Gupta

Yuvraj Gupta

View More author details

Chapter 11. Best Practices

In the previous chapter, we discussed the various components provided by X-Pack as part of Elastic Stack. We explored each of the components in detail covering what the component offers and the various functionalities provided by the component.

If you have been following the chapters, then you probably have a lot of questions in your mind on how should I start or proceed to create a scalable system, and what should be the best practices that should be adhered upon for creating an efficient system.

At the end of this chapter, you will understand some of the best practices that can be used in Elastic Stack, to make it production-ready after learning from other people's experiences.

In this chapter, we will cover the following topics:

Why do we require best practices?
Understanding your use case
Choosing the right set of hardware
Searching and indexing performance
Sizing the Elasticsearch cluster
Logstash configuration files
Re-indexing data

Why do we require best practices?

When we start to learn a tool, we always try to use the tool on a standalone machine from which we gain expertise and can experiment various things. As and when we take the small setup to a big setup ranging multiple systems, there can be a lot of things that can hinder the performance or could not provide the optimum results that we get on a small setup. In order to achieve the best results, we require understanding and following some of the practices that have been followed by others leading to a higher performance and less troubles. In terms of programming languages, we have the concept of best coding practices, which describes how to write a code that is easy to understand, maintain, and read, which reduces the effort if someone else starts working on the same or similar piece of code. But in tools or components there is no hard and fast rule of best practices. Best practices will depend on various factors such as the architecture designed, components...

Understanding your use case

This is the basic level of information that everyone should know before even thinking about the best practices or using Elastic Stack. If you learn about Elastic Stack and without thinking about your use case, you start to connect or process random data, then you will not be able to deduce proper information as required. You will always remain stranded when asked why you chose this or why not another software or tool. Hence, it becomes immensely important to understand your use case beforehand.

Your use-case primarily answers a lot of questions regarding the components to be used, the criticality of logging data flowing in, requirement of high availability clusters, and a need for centralized logging system. If your use-case comprises of analyzing the logs of an application, then you can make a decision accordingly about whether you need a single elasticsearch cluster, or if you need to create a centralized logging system.

One of the biggest questions that arises...

Managing configuration files

The configuration file changes are the most basic yet important to make from a production deployment perspective. Let's have a look at the configuration files of various components.

Elasticsearch - elasticsearch.yml

By default, Elasticsearch sets values for important properties such as cluster, node related like cluster name, node name, and so on. While it's not necessary to set, it's a good idea to customize the names. For example, we should specify the node names so that we can remember and keep track of the node statistics by the node name that we specified. Few of such properties are explained below:

Change the name of the cluster by modifying the following property:

      # cluster.name: my-application  
      cluster.name: production-elasticstack

Change the name of the node to easily identify the nodes joining in the cluster by modifying the following property:
```
      # node.name: node-1 
      node.name: elasticstack1 
```
Change the location...

Choosing the right set of hardware

The important metric to understand while installing Elastic Stack, is to know how much memory will be consumed, disk space required, maintaining proper I/O requests, how much CPU or cores are required for resource consumption, and the network among the systems. Whenever we install Elastic Stack there is always confusion on the amount of resources used by each of the component. Let us try to break the resources requirement based on the various components.

Memory

It is one of the most important parameters that affect the performance of an application. If the memory provided is lower than expected, then the application can stop, fail, or show an Out of Memory error.

Memory sizing is important as memory is used by the OS for various tasks. Hence, determining how much memory to provide to Elastic Stack components is essential such that application and OS performance do not get affected.

In the context of Elastic Stack, memory is crucial to decide if you should configure...

Searching and indexing performance

So far we have uncovered some of the best practices in terms of resources. As memory, CPU, I/O, disks, and network play a big part in choosing the preferred set of system configurations; we can tweak a few settings to improve resources usage for searching and indexing in Elasticsearch and Lucene.

Filter cache

By default, the filters used in Elasticsearch for querying are cached, which means when the query uses filter, Elasticsearch finds the documents related to the filter and stores the filter used as cache. After caching, if any query with the same filters are used it will provide quicker results as filters have been cached to memory. As internally it uses memory, it is wise to set a property to limit the usage of the Filter cache. Though each filter uses less memory, JVM heap size can take a hit if a large number of filters are used. By using the following property, we can limit the amount of Heap memory that can be used for the filter cache:

indices.cache...

Sizing the Elasticsearch cluster

It is important to understand how we can size the Elasticsearch cluster efficiently by choosing the right kind of node, determining the number of nodes in the cluster, determining the number of shards and replicas to use, and determining the number of indices to store. There are no fixed rules to follow in order to size the Elasticsearch cluster.

Choosing the right kind of node

In Elasticsearch, we have always dealt with nodes, but somewhere no clear distinction has been made on the different types of node that are available. Let's understand the different type of nodes that can be created in the Elasticsearch cluster.

Master and data node

This is the default node that is created in the Elasticsearch cluster whenever an Elasticsearch instance is started. This type of node acts as a master node as well as stores the data. If this node is not a master node and another node fails, then this node will be available to become the master node. It performs operations...

Logstash configuration file

Configuration file is the key to run Logstash. As Logstash requires the configuration file to be created/updated it is important to have an efficient and flexible configuration that can be easily changed as and when required. Let's have a look at some of the best practices that we should follow.

Categorizing multiple sources of data

When you have multiple different sources of data that you want to gather and uncover insights from them, it is best to categorize each source of data by adding a type to each of the different sources.

We can take a look at the following example:

input { 
   file { 
      path => "/path/to/directory/" 
      type => "datanode" 
   } 
    file { 
      path => "/path/to/directory/" 
      type => "hbase" 
} 
    file { 
      path => "/path/to/directory/" 
      type => "yarn" 
   } 
}

When you add a type, you can use different filters and output based...

Re-indexing data

Re-indexing data in Elasticsearch is a challenge when you have changed the schema or mappings of the fields. Upon changing the schema, you are either required to re-index all the documents of that field to incorporate mapping changes to previous documents stored, or to not re-index older documents, which will become useless with the change in schema.

Process of re-indexing data:

Create a new index with the new mappings and settings.
Take the documents from the old index and index it in a new index.

To minimize the effect and downtime of changing the schema and re-indexing, use the following approach.

Using aliases

Aliases are a powerful feature that can easily re-index the complete index data without any downtime. Alias can be considered as a nickname given to the index name. It can be considered as a symbolic link.

Let us see how to use aliases:

Create an index with its mapping and settings
Create an alias to point to the index name

After updating the schema/mappings:

Create a new...

Summary

Best practices should be followed as they will eliminate the various problems faced while setting, configuring, and using Elastic Stack. There are several settings that can be tuned as per requirement making the stack more stable.

Throughout this chapter, we encountered a number of ways to configure and use Elastic Stack. While this chapter tried to cover most of the important points to note, there can also be other settings that may turn out to be a best practice for specific requirements. The configurations and settings must be analyzed closely to avoid any loop holes. Remember that one poor setting may lead to a disaster.

In the next chapter, we will have a look at the case studies to explore how Elastic Stack can be utilized to meet end objectives.[footnote]

The rest of the chapter is locked

You have been reading a chapter from

Mastering Elastic Stack

Published in: Feb 2017Publisher: PacktISBN-13: 9781786460011

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Ravi Kumar Gupta

Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development. He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools. He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in). He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing. He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Read more about Ravi Kumar Gupta

Yuvraj Gupta

Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at gupta.yuvraj@gmail.com or at LinkedIn www.linkedin.com/in/guptayuvraj.
Read more about Yuvraj Gupta

Other recommended products

Related to this chapter

Kibana 7 Quick Start Guide

Kibana is the visualization tool of the Elastic Stack, used for visualizing the results of the queries as well the dashboards generated out of the Elasticsearch and Logstash components. This book contains core concepts of Kibana with a straightforward form of chapters so that reader can move forward in a step by step manner.

BookJan 2019172 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Elasticsearch 7 Quick Start Guide

Elasticsearch is one of the most popular tools for distributed search. This book will help you in understanding all about the new features of Elasticsearch 7, and how to use them efficiently for searching, aggregating and indexing data with speed and accuracy.

BookOct 2019186 pages

Learning Elastic Stack 6.0

This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications. It provide in-depth coverage of the different components of the Elastic Stack, and how to use them all together.

BookDec 2017434 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Learning Elastic Stack 7.0

This book teaches you about every component of the Elastic Stack - including Elasticsearch, Kibana, Logstash, and X-pack - with new and the updated features that are released with the 7.0 version. With the help of this book, you will be able to develop enterprise-grade distributed search and analytics applications for your data without any hassle.

BookMay 2019474 pages

Learning Elasticsearch

Elasticsearch is a Lucene-based search and analytics engine for distributed search and analytics. This book will be your hands-on guide as you explore and put to use the features of Elasticsearch 5.x.

BookJun 2017404 pages

Mastering Elasticsearch 5.x

This book will help you leverage Elasticsearch, guiding you through everything from writing and creating customized plugins to extend Elasticsearch to tackling challenges while handling relational data in Elasticsearch. You’ll learn with the help of practical examples in a step-by-step way.

BookFeb 2017428 pages

Elasticsearch 5.x Cookbook

BookFeb 2017696 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Elasticsearch 7.0 Cookbook

This book is your one-stop guide to master Elasticsearch. It provides numerous problem-solution based recipes through which you can implement Elasticsearch in your enterprise applications in a very simple, hassle-free way.

BookApr 2019724 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages