Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Elasticsearch 5.x Cookbook - Third Edition

You're reading from  Elasticsearch 5.x Cookbook - Third Edition

Product type Book
Published in Feb 2017
Publisher
ISBN-13 9781786465580
Pages 696 pages
Edition 3rd Edition
Languages
Author (1):
Alberto Paro Alberto Paro
Profile icon Alberto Paro

Table of Contents (25) Chapters

Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Dedication
Preface
Getting Started Downloading and Setup Managing Mappings Basic Operations Search Text and Numeric Queries Relationships and Geo Queries Aggregations Scripting Managing Clusters and Nodes Backup and Restore User Interfaces Ingest Java Integration Scala Integration Python Integration Plugin Development Big Data Integration

Chapter 2. Downloading and Setup

In this chapter, we will cover the following recipes:

  • Downloading and installing Elasticsearch

  • Setting up networking

  • Setting up a node

  • Setting up for Linux systems

  • Setting up different node types

  • Setting up a client node

  • Setting up an ingest node

  • Installing plugins in Elasticsearch

  • Installing plugins manually

  • Removing a plugin

  • Changing logging settings

  • Setting up a node via Docker

Introduction


This chapter explains the installation process and the configuration from a single developer machine to a big cluster, giving you hints on how to improve performance and skip misconfiguration errors.

There are different options in installing Elasticsearch and setting up a working environment for development and production.

When testing out Elasticsearch for a development cluster, the tool requires almost no configuration. However, when moving to production, it is important to properly configure the cluster based on your data, use cases, and your product architecture. The setup step is very important because a bad configuration can lead to bad results, poor performances, and kill your servers.

In this chapter, the management of Elasticsearch plugins is also discussed: installing, configuring, updating, and removing.

Downloading and installing Elasticsearch


Elasticsearch has an active community and the release cycles are very fast.

Because Elasticsearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the Elasticsearch community tries to keep them updated and fixes bugs that are discovered in them and in Elasticsearch core. The large user base is also source of new ideas and features for improving Elasticsearch use cases.

For these reasons, if it's possible, the best practice is to use the latest available release (usually the more stable one and the less bugs free).

Getting ready

A supported Elasticsearch operative system (Linux/MacOSX/Windows) with a Java JVM 1.8 (the Oracle one is the preferred http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) or above installed. A web browser is required to download the Elasticsearch binary release. At least 1GB of free disk space is required to install Elasticsearch.

How to do it...

For downloading...

Setting up networking


Correctly setting up networking is very important for your nodes and cluster.

There are a lot of different installation scenarios and networking issues: the first step for configuring the nodes to build a cluster is to correctly set the node discovery.

Getting ready

You need a working Elasticsearch installation and know your current networking configuration (that is, IP).

How to do it...

For configuring networking, we will perform the following steps:

  • Open the Elasticsearch configuration file with your favorite text editor.

  • Using standard Elasticsearch configuration config/elasticsearch.yml file, your node is configured to bind on all your machine interfaces and does discovery broadcasting events to the nodes listed in discovery.zen.ping.unicast.hosts. This means that it sends signals to the machine in unicast list and waits for a response. If a node responds to it, they can join in a cluster.

  • If another node is available in the same LAN, they join the cluster.

    Note

    Only nodes...

Setting up a node


Elasticsearch allows customizing several parameters in an installation. In this recipe, we'll see the most used ones to define where to store our data and to improve the overall performances.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

The steps required for setting up a simple node are as follows:

  • Open config/elasticsearch.yml with an editor of your choice.

  • Setup the directories that store your server data.

  • For Linux or Mac OS X type the following command:

        path.conf: /opt/data/es/conf
        path.data: /opt/data/es/data1,/opt2/data/data2
        path.work: /opt/data/work 
        path.logs: /opt/data/logs 
        path.plugins: /opt/data/plugins 
  • For Windows type the following command:

        path.conf: c:\Elasticsearch\conf 
        path.data: c:\Elasticsearch\data 
        path.work: c:\Elasticsearch\work 
  ...

Setting up for Linux systems


If you are using a Linux system, you need to manage extra setup to improve performance or to resolve production problems with many indices.

This recipe covers two common errors that happened in production:

  • Too many open files that can corrupt your indices and your data

  • Slow performance in search and indexing due to garbage collector

Note

The other possible big troubles arise when you go out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data loss, it is best practice to monitor the storage spaces.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this chapter and a simple text editor to change configuration files.

How to do it...

For improving the performances on Linux systems, we will perform the following steps:

  1. First you need to change the current limit for the user that runs the Elasticsearch server. In these examples...

Setting up different node types


Elasticsearch is natively designed for the cloud, so when you need to release a production environment with a huge number of records and you need high availability and good performances, you need to aggregate more nodes in a cluster.

Elasticsearch allows defining different type of nodes to balance and improve overall performances.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change the configuration files.

How to do it...

For advance, set up a cluster. There are some parameters that must be configured to define different node types.

These parameters are in config/elasticsearch.yml file and they can be set with the following steps:

  1. Set up whether the node can be master or not:

            node.master: true 
    
  2. Set up whether a node must contain data or not:

            node.data: true 
    

How it works...

The node.master parameter defines that the node can become master...

Setting up a client node


The master nodes that we have seen previously are the most important for cluster stability. To prevent the queries and aggregations from creating instability in your cluster, client nodes can be used to provide safe communication with the cluster.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this chapter and a simple text editor to change configuration files.

How to do it...

For advance set up of a cluster, there are some parameters that must be configured to define different node types.

These parameters are in the config/elasticsearch.yml file and they can set up a client node with the following steps:

  1. Set up the node as a no master:

            node.master: false 
    
  2. Set up the node to not contain data:

            node.data: false 
    

How it works...

The client node is a special node that works as a proxy/pass thought for the cluster.

Its main advantages are:

  • It can easily kill or remove the cluster...

Setting up an ingestion node


The main goals of Elasticsearch are indexing, searching, and analytics, but it's often required to modify or enhance the documents before storing in Elasticsearch.

The most common scenarios in this case are:

  • Preprocessing the log string to extract meaningful data.

  • Enrich the content of some textual fields with Natural Language Processing (NLP) tools.

  • Add some transformation during ingestion such as convert IP in geolocalization or build custom fields at ingest time

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

To set up an ingest node, you need to edit the config/elasticsearch.yml file and set up the ingest property to true:

node.ingest: true

How it works...

The default configuration for Elasticsearch is to set the node as ingest node (refer to Chapter 13, Ingest, for more info on ingestion pipeline).

As the client node...

Installing plugins in Elasticsearch


One of the main features of Elasticsearch is the possibility to extend it with plugins. Plugins extend Elasticsearch features and functionalities in several ways.

In Elasticsearch 5.x, the plugins are native plugins--they are jars files that contain application code. They are used for:

  • ScriptEngine (JavaScript, Python, Scala, and Ruby)

  • Custom Analyzers, tokenizers, and scoring

  • REST entry points

  • Ingestion pipeline stages

  • Supporting new storages (Hadoop)

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory.

How to do it...

Elasticsearch provides a script for automatic download and for installation of plugins in bin/directory called plugin.

The steps required to install a plugin are:

  • Call the plugin and install Elasticsearch command with the plugin name reference.

  • For installing an administrative interface for Elasticsearch...

Installing plugins manually


Sometimes your plugin is not available online or standard installation fails, so you need to install your plugin manually.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory.

How to do it...

We assume that your plugin is named awesome and it's packed in a file called awesome.zip.

The steps required to manually install a plugin are:

  • Copy your zip file in the plugins directory in your Elasticsearch home installation

  • If the directory named plugins doesn't exist, create it

  • Unzip the content of the plugin in the plugins directory

  • Remove the zip archive to clean up unused files

How it works...

Every Elasticsearch plugin is contained in a directory (usually named as the plugin name). The plugin directory should be filled with one or more JAR files.

When Elasticsearch starts, it scans the plugins directory and loads them.

Note

If...

Removing a plugin


You have installed some plugins and now you need to remove a plugin because it's not required. Removing an Elasticsearch plugin is easy to uninstall if everything goes right, otherwise you need to manually remove it.

This recipe covers both cases.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory. Before removing a plugin, it is safer to stop Elasticsearch server to prevent error due to the deletion of plugin JAR.

How to do it...

The steps to remove a plugin are as follows:

  1. Stop your running node to prevent exceptions caused due to removal of a file.

  2. Using the Elasticsearch plugin manager, which comes with its script wrapper (plugin).

    On Linux and MacOSX, type the following command:

            elasticsearch-plugin remove lang-python      

    On Windows, type the following command:

            elasticsearch-plugin.bat remove lang-python 
    
  3. Restart...

Changing logging settings


Standard logging settings work very well for general usage.

Changing the log level can be useful to check for bugs or understanding malfunctions due to bad configuration or strange plugin behaviors. A verbose log can be used from Elasticsearch community to cover problems.

If you need to debug your Elasticsearch server or change how the logging works (that is, remoting send events), you need to change the log4j2.properties parameters.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

In the config directory in your Elasticsearch install directory, there is a log4j2.properties file, which controls the working settings.

The steps required for changing the logging settings are:

  1. To emit every kind of logging Elasticsearch has, you can change the current root level logging which is:

            rootLogger.level = info 
    
  2. This needs...

Setting up a node via Docker


Docker (https://www.docker.com/) has become a common way to deploy for testing or production some application server.

Docker is a container system that allows to easily deploy replicable installations of server applications. With Docker, you don't need to set up a host, configure it, download the Elasticsearch server, unzip it, or start the server--everything is done automatically by Docker.

Getting ready

You need a working Docker installation to be able to execute docker commands (https://www.docker.com/products/overview).

How to do it...

  1. If you want to start a vanilla server, just execute:

            docker pull docker.elastic.co/elasticsearch/elasticsearch:5.1.1
  2. An output similar to the following screenshot will be shown:

  3. After downloading the Elasticsearch image, we can start a develop instance via:

            docker run -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e    
            "transport.host=0.0.0.0"    
            docker.elastic.co/elasticsearch/elasticsearch:5.1...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 5.x Cookbook - Third Edition
Published in: Feb 2017 Publisher: ISBN-13: 9781786465580
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}