Packt+ | Advance your knowledge in tech

You're reading from Elasticsearch 5.x Cookbook - Third Edition

Product type Book

Published in Feb 2017

Publisher

ISBN-13 9781786465580

Pages 696 pages

Edition 3rd Edition

Languages

Concepts

Enterprise Search

Author (1):

Alberto Paro

Table of Contents (25) Chapters

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Dedication

Preface

Getting Started

Downloading and Setup

Managing Mappings

Basic Operations

Text and Numeric Queries

Relationships and Geo Queries

Aggregations

Scripting

Managing Clusters and Nodes

Backup and Restore

User Interfaces

Ingest

Java Integration

Scala Integration

Python Integration

Plugin Development

Big Data Integration

Chapter 2. Downloading and Setup

In this chapter, we will cover the following recipes:

Downloading and installing Elasticsearch
Setting up networking
Setting up a node
Setting up for Linux systems
Setting up different node types
Setting up a client node
Setting up an ingest node
Installing plugins in Elasticsearch
Installing plugins manually
Removing a plugin
Changing logging settings
Setting up a node via Docker

Introduction

This chapter explains the installation process and the configuration from a single developer machine to a big cluster, giving you hints on how to improve performance and skip misconfiguration errors.

There are different options in installing Elasticsearch and setting up a working environment for development and production.

When testing out Elasticsearch for a development cluster, the tool requires almost no configuration. However, when moving to production, it is important to properly configure the cluster based on your data, use cases, and your product architecture. The setup step is very important because a bad configuration can lead to bad results, poor performances, and kill your servers.

In this chapter, the management of Elasticsearch plugins is also discussed: installing, configuring, updating, and removing.

Downloading and installing Elasticsearch

Elasticsearch has an active community and the release cycles are very fast.

Because Elasticsearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the Elasticsearch community tries to keep them updated and fixes bugs that are discovered in them and in Elasticsearch core. The large user base is also source of new ideas and features for improving Elasticsearch use cases.

For these reasons, if it's possible, the best practice is to use the latest available release (usually the more stable one and the less bugs free).

Getting ready

A supported Elasticsearch operative system (Linux/MacOSX/Windows) with a Java JVM 1.8 (the Oracle one is the preferred http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) or above installed. A web browser is required to download the Elasticsearch binary release. At least 1GB of free disk space is required to install Elasticsearch.

How to do it...

For downloading...

Setting up networking

Correctly setting up networking is very important for your nodes and cluster.

There are a lot of different installation scenarios and networking issues: the first step for configuring the nodes to build a cluster is to correctly set the node discovery.

Getting ready

You need a working Elasticsearch installation and know your current networking configuration (that is, IP).

How to do it...

For configuring networking, we will perform the following steps:

Open the Elasticsearch configuration file with your favorite text editor.
Using standard Elasticsearch configuration config/elasticsearch.yml file, your node is configured to bind on all your machine interfaces and does discovery broadcasting events to the nodes listed in discovery.zen.ping.unicast.hosts. This means that it sends signals to the machine in unicast list and waits for a response. If a node responds to it, they can join in a cluster.
If another node is available in the same LAN, they join the cluster.
Note
Only nodes...

Setting up a node

Elasticsearch allows customizing several parameters in an installation. In this recipe, we'll see the most used ones to define where to store our data and to improve the overall performances.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

The steps required for setting up a simple node are as follows:

Open config/elasticsearch.yml with an editor of your choice.
Setup the directories that store your server data.
For Linux or Mac OS X type the following command:

        path.conf: /opt/data/es/conf
        path.data: /opt/data/es/data1,/opt2/data/data2
        path.work: /opt/data/work 
        path.logs: /opt/data/logs 
        path.plugins: /opt/data/plugins

For Windows type the following command:

        path.conf: c:\Elasticsearch\conf 
        path.data: c:\Elasticsearch\data 
        path.work: c:\Elasticsearch\work 
  ...

Setting up for Linux systems

If you are using a Linux system, you need to manage extra setup to improve performance or to resolve production problems with many indices.

This recipe covers two common errors that happened in production:

Too many open files that can corrupt your indices and your data
Slow performance in search and indexing due to garbage collector

Note

The other possible big troubles arise when you go out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data loss, it is best practice to monitor the storage spaces.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this chapter and a simple text editor to change configuration files.

How to do it...

For improving the performances on Linux systems, we will perform the following steps:

First you need to change the current limit for the user that runs the Elasticsearch server. In these examples...

Setting up different node types

Elasticsearch is natively designed for the cloud, so when you need to release a production environment with a huge number of records and you need high availability and good performances, you need to aggregate more nodes in a cluster.

Elasticsearch allows defining different type of nodes to balance and improve overall performances.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change the configuration files.

How to do it...

For advance, set up a cluster. There are some parameters that must be configured to define different node types.

These parameters are in config/elasticsearch.yml file and they can be set with the following steps:

Set up whether the node can be master or not:
```
        node.master: true 
```
Set up whether a node must contain data or not:
```
        node.data: true 
```

How it works...

The node.master parameter defines that the node can become master...

Setting up a client node

The master nodes that we have seen previously are the most important for cluster stability. To prevent the queries and aggregations from creating instability in your cluster, client nodes can be used to provide safe communication with the cluster.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this chapter and a simple text editor to change configuration files.

How to do it...

For advance set up of a cluster, there are some parameters that must be configured to define different node types.

These parameters are in the config/elasticsearch.yml file and they can set up a client node with the following steps:

Set up the node as a no master:
```
        node.master: false 
```
Set up the node to not contain data:
```
        node.data: false 
```

How it works...

The client node is a special node that works as a proxy/pass thought for the cluster.

Its main advantages are:

It can easily kill or remove the cluster...

Setting up an ingestion node

The main goals of Elasticsearch are indexing, searching, and analytics, but it's often required to modify or enhance the documents before storing in Elasticsearch.

The most common scenarios in this case are:

Preprocessing the log string to extract meaningful data.
Enrich the content of some textual fields with Natural Language Processing (NLP) tools.
Add some transformation during ingestion such as convert IP in geolocalization or build custom fields at ingest time

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

To set up an ingest node, you need to edit the config/elasticsearch.yml file and set up the ingest property to true:

node.ingest: true

How it works...

The default configuration for Elasticsearch is to set the node as ingest node (refer to Chapter 13, Ingest, for more info on ingestion pipeline).

As the client node...

Installing plugins in Elasticsearch

One of the main features of Elasticsearch is the possibility to extend it with plugins. Plugins extend Elasticsearch features and functionalities in several ways.

In Elasticsearch 5.x, the plugins are native plugins--they are jars files that contain application code. They are used for:

ScriptEngine (JavaScript, Python, Scala, and Ruby)
Custom Analyzers, tokenizers, and scoring
REST entry points
Ingestion pipeline stages
Supporting new storages (Hadoop)

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory.

How to do it...

Elasticsearch provides a script for automatic download and for installation of plugins in bin/directory called plugin.

The steps required to install a plugin are:

Call the plugin and install Elasticsearch command with the plugin name reference.
For installing an administrative interface for Elasticsearch...

Installing plugins manually

Sometimes your plugin is not available online or standard installation fails, so you need to install your plugin manually.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory.

How to do it...

We assume that your plugin is named awesome and it's packed in a file called awesome.zip.

The steps required to manually install a plugin are:

Copy your zip file in the plugins directory in your Elasticsearch home installation
If the directory named plugins doesn't exist, create it
Unzip the content of the plugin in the plugins directory
Remove the zip archive to clean up unused files

How it works...

Every Elasticsearch plugin is contained in a directory (usually named as the plugin name). The plugin directory should be filled with one or more JAR files.

When Elasticsearch starts, it scans the plugins directory and loads them.

Note

If...

Removing a plugin

You have installed some plugins and now you need to remove a plugin because it's not required. Removing an Elasticsearch plugin is easy to uninstall if everything goes right, otherwise you need to manually remove it.

This recipe covers both cases.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a prompt/shell to execute commands in Elasticsearch install directory. Before removing a plugin, it is safer to stop Elasticsearch server to prevent error due to the deletion of plugin JAR.

How to do it...

The steps to remove a plugin are as follows:

Stop your running node to prevent exceptions caused due to removal of a file.
Using the Elasticsearch plugin manager, which comes with its script wrapper (plugin).
On Linux and MacOSX, type the following command:
```
        elasticsearch-plugin remove lang-python      
```
On Windows, type the following command:
```
        elasticsearch-plugin.bat remove lang-python 
```
Restart...

Changing logging settings

Standard logging settings work very well for general usage.

Changing the log level can be useful to check for bugs or understanding malfunctions due to bad configuration or strange plugin behaviors. A verbose log can be used from Elasticsearch community to cover problems.

If you need to debug your Elasticsearch server or change how the logging works (that is, remoting send events), you need to change the log4j2.properties parameters.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

In the config directory in your Elasticsearch install directory, there is a log4j2.properties file, which controls the working settings.

The steps required for changing the logging settings are:

To emit every kind of logging Elasticsearch has, you can change the current root level logging which is:
```
        rootLogger.level = info 
```
This needs...

Setting up a node via Docker

Docker (https://www.docker.com/) has become a common way to deploy for testing or production some application server.

Docker is a container system that allows to easily deploy replicable installations of server applications. With Docker, you don't need to set up a host, configure it, download the Elasticsearch server, unzip it, or start the server--everything is done automatically by Docker.

Getting ready

You need a working Docker installation to be able to execute docker commands (https://www.docker.com/products/overview).

How to do it...

If you want to start a vanilla server, just execute:

        docker pull docker.elastic.co/elasticsearch/elasticsearch:5.1.1

An output similar to the following screenshot will be shown:

After downloading the Elasticsearch image, we can start a develop instance via:

        docker run -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e    
        "transport.host=0.0.0.0"    
        docker.elastic.co/elasticsearch/elasticsearch:5.1...