It's as easy to read a log file of a few MBs or hundreds as it is to keep data of this size in databases or files and still get sense out of it. But then a day comes when this data takes up terabytes, petabytes and grows even faster in future. As data demand pushes, normal text editors or word processing tools would refuse to cope up and would not be able to open such a large dataset. There would be a need to analyze the raw data which can be used to discover insights. You start to find something for huge log management, or something that can index the data properly and make sense out of it. If you Google this, you will stumble upon ELK Stack. Elasticsearch manages your data, Logstash reads the data from different sources, and Kibana makes a fine visualization of it.
Recently, ELK Stack has evolved as Elastic Stack. We will get to know more about it in this chapter, along with setting it up. The following are the points that will be covered in this chapter:
Introduction to ELK Stack
The birth of Elastic Stack
Who uses the Stack
Stack competitors
Setting up Elastic Stack
X-Pack
It all began with Shay Banon, who started an open source project called Elasticsearch, successor of Compass, which gained popularity as one of the top open source database engines. Later, based on the distributed model of working, Kibana was introduced, to visualize the data present in Elasticsearch. Earlier, to put data into Elasticsearch, we had Rivers, which provided us with a specific input via which we inserted data into Elasticsearch.
However, with growing popularity, this setup required a tool via which we could insert data into Elasticsearch and have flexibility to perform various transformations on data (to make unstructured data structured and have full control on how to process the data). Based on this premise, Logstash was born, which was then incorporated into the Stack, and together these three tools, Elasticsearch, Logstash, and Kibana were named ELK Stack.
The following diagram is a simple data pipeline using ELK Stack:

As we can see from the preceding figure, data is read using Logstash and indexed to Elasticsearch. Later, we can use Kibana to read the indices from Elasticsearch and visualize it using charts and lists. Let's understand these components separately, and the role they play in the making of the Stack.
As mentioned earlier, Rivers were initially used to put data into Elasticsearch before ELK Stack. For ELK Stack, Logstash is the entry point for all types of data. Logstash has so many plugins to read data from a number of sources, and so many output plugins to submit data to a variety of destinations - one of those is the Elasticsearch plugin, which helps to send data to Elasticsearch.
After Logstash became popular, Rivers eventually got deprecated, as they made the cluster unstable and also performance issues were observed.
Logstash does not just ship data from one end to another; it helps us with collecting raw data and modifying/filtering it to convert it to something meaningful, formatted, and organized. The updated data is then sent to Elasticsearch. If there is no plugin available to support reading data from a specific source, writing the data to a location, or modifying it in your own way, Logstash is flexible enough to allow you to write your own plugins.
Simply put, Logstash is open source, highly flexible, rich with plugins and can read your data from your choice of location. It normalizes data as per your defined configurations, and sends it to a particular destination, as per the requirements.
We will be learning more about Logstash in Chapter 3, Exploring Logstash and Its Plugins and Chapter 7, Customizing Elastic Stack.
All of the data read by Logstash is sent to Elasticsearch for indexing. Elasticsearch is not only used to index data, it is also full-text search engine, highly scalable, distributed, and offers many more things too. Elasticsearch manages and maintains your data in the form of indices and offers you to query, access, and aggregate the data using its APIs. Elasticsearch is based on Lucene, thus providing you all of the features that Lucene does.
We will be learning more about Elasticsearch in Chapter 2, Stepping into Elasticsearch, Chapter 7, Customizing Elastic Stack, and Chapter 8, Elasticsearch APIs.
Kibana uses Elasticsearch APIs to read/query data from Elasticsearch indices, to visualize and analyze in the form of charts, graphs and tables. Kibana is in the form of a web application, providing you with a highly configurable user interface that lets you query the data, create a number of charts to visualize, and make actual sense out of the data stored.
We will be learning more about Kibana in Chapter 4, Kibana Interface and Chapter 7, Customizing Elastic Stack.
After a robust ELK Stack, as time passed, a few important and complex demands took place, such as authentication, security, notifications, and so on. This demand led to the development of a few other tools such as Watcher (providing alerts and notifications based on changes in data), Shield (authentication and authorization for securing clusters), Marvel (monitoring statistics of the cluster), ES-Hadoop, Curator, and Graph, as requirements arose.
All the jobs of reading data were once done using Logstash, but that's resource consuming. Since Logstash runs on JVM, it consumes a good amount of memory. The community realized the need for improvement and to make the pipelining process resource friendly and lightweight. In 2015, Packetbeat was born, a project which was an effort to make a network packet analyzer that could read from different protocols, parse the data, and ship to Elasticsearch. Being lightweight in nature did the trick and a new concept of Beats was formed. Beats are written in Go programming language. The project evolved, and now ELK stack was no more just Elasticsearch, Logstash, and Kibana; Beats also became a significant component.
The pipeline now looked as follows:

A Beat reads data, parses it, and can ship it to either Elasticsearch or Logstash. The difference is that they are lightweight, serve a specific purpose, and are installed as agents. There are a few Beats available such as Metricbeat, Filebeat, Packetbeat, and so on, which are supported and provided by the Elastic Team and a good number of Beats are already written by the community. If you have a specific requirement, you can write your own Beat using the libbeat
library.
In simple words, Beats can be treated as very lightweight agents to ship data to either Logstash or Elasticsearch, offering you an infrastructure using the libbeat
library to create your own Beats.
We will be learning more about Beats in Chapter 5, Using Beats and Chapter 7, Customizing Elastic Stack.
Together Elasticsearch, Logstash, Kibana, and Beats became Elastic Stack, formally known as ELK Stack. Elastic Stack did not just add Beats to its team; they will be using the same version always. The starting version of the Elastic Stack will be 5.0.0 and the same version will apply to all the components.
This version and release method is not only for Elastic Stack, but for other tools of the Elastic family as well. Due to there being so many tools, there was a problem of unification, wherein each tool had their own version, and every version was not compatible with each other, hence leading to a problem. To solve this, all of the tools will now be built, tested, and released together.
All of these components play a significant role in creating a pipeline. While Beats and Logstash are used to collect the data, parse it, and ship it, Elasticsearch creates indices, which is finally used by Kibana to make visualizations. While Elastic Stack helps with a pipeline, other tools add security, notifications, monitoring, and other such capabilities to the setup.
In the past few years, implementations of Elastic Stack have been increasing very rapidly. In this section, we will consider a few case studies to understand how Elastic Stack has helped this development.
Salesforce developed a new plugin named ELF (Event Log Files) to collect Salesforce logged data to, enable auditing of user activities. The purpose was to analyze the data to understand user behavior and trends in Salesforce.
The plugin is available on GitHub at https://github.com/developerforce/elf_elk_docker.
ELF is an abbreviation for Event Log Files. This plugin simplifies the Stack configuration and allows to download Event Log Files to get indexed and finally make sense of the data by visualizing it using Kibana. This implementation utilizes Elasticsearch, Logstash and Kibana.
There is not just one use case that Elastic Stack helped CERN (European Organization for Nuclear Research), but five. At CERN, Elastic Stack is used for the following:
Messaging
Data monitoring
Cloud benchmarking
Infrastructure monitoring
Job monitoring
Multiple Kibana dashboards are used by CERN for a number of visualizations.
Green Man Gaming is an online gaming platform where game providers publish their games. The website wanted to make a difference by proving better gameplay. They started using Elastic Stack to carry out log analysis, search, and analysis of gameplay data.
They began with setting up Kibana dashboards to gain insights about the counts of gamers, by the country and currency used by gamers. This helped them to understand and streamline support and help in order to provide an improved response.
Apart from these case studies, Elastic Stack is used by a number of other companies to gain insights into the data they own. Sometimes, not all of the components are used; that is, not all of the times a Beat would be used and Logstash would be configured. Sometimes, only an Elasticsearch and Kibana combination is used.
If we look at the users within the organization, all of the titles who are expected to do big data analysis, business intelligence, data visualizations, log analysis, and so on, can utilize Elastic Stack for their technical forte, for example data scientists, devops, and so on.
Well, it would be wrong to call for Elastic Stack Competitors because Elastic Stack has been emerged as a strong competitor to many other tools in the market in recent years and is growing rapidly. Few of these are:
Open source:
Graylog: Visit https://www.graylog.org/ for more information
InfluxDB: Visit https://influxdata.com/ for more information
Others:
Logscape: Visit http://logscape.com/ for more information
Logscene: Visit http://sematext.com/logsene/ for more information
Splunk: Visit http://www.splunk.com/ for more information
Sumo Logic: Visit https://www.sumologic.com/ for more information
Kibana competitors:
Grafana: Visit http://grafana.org/ for more information
Graphite: Visit https://graphiteapp.org/ for more information
Elasticsearch competitors:
Lucene/Solr: Visit https://lucene.apache.org/ or http://lucene.apache.org/solr/ for more information
Sphinx: Visit http://sphinxsearch.com/ for more information
Most of these compare with respect to log management, while Elastic Stack is much more than that. It offers you the ability to analyze any type of data, not just logs.
In this section, we will install all four components of Elastic Stack on two popular operating systems - Microsoft Windows and Ubuntu. As a pre-requisite for installation of Elasticsearch or Logstash, Java should be installed. In case you have Java installed you can skip the Installation of Java section.
In this section, JDK needs to be installed for accessing Elasticsearch. Oracle Java 8 (Oracle JDK version 1.8.0_73 onwards) should be installed, as it is the recommended version for Elasticsearch 5.0.0 onwards.
Install Java 8 using terminal and apt package in the following manner:
Add Oracle Java PPA (Personal Package Archive) to apt repository list:
sudo add-apt-repository -y ppa:webupd8team/java
Note
In this case, we use a third-party repository. It does not violate the Oracle Java Rules by not including Java binaries; instead this PPA directly downloads Java Binaries from Oracle and installs the binaries.
You will be prompted to enter a password after running sudo command (unless you are not logged into as root) and you would receive OK on successful addition to repository, which indicates repository has been imported.
Update the apt package database to include all the latest files under the packages:
sudo apt-get update
Install the latest version of Oracle Java 8:
sudo apt-get -y install oracle-java8-installer
Also during installation, you will be prompted to accept the license agreement which pops up as shown in the following screenshot:
To check whether Java has successfully installed, type the following command into the terminal:
java -version
The preceding screenshot signifies Java has installed successfully.
We can install Java on windows by going through the below steps:
Download the latest version of Java JDK from Sun Microsystems site using the following link:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Upon opening the link click on the Download button of JDK to download.
You will be redirected to the download page - first click on the Accept License Agreement radio button, then click on your Windows version (use x86 for 32-bit or x64 for 64-bit) to download the EXE file.
Double click on installation file and it will open as an installer.
Click on Next followed by accepting license by reading it, and keep clicking next until it shows JDK has successfully installed.
Now for running Java in windows, you need to set the path of JAVA in the environment variable settings of Windows. Firstly open properties of My Computer. Select Advanced system settings and then click on the Advanced tab wherein you will click environment variables options as shown in the following screenshot:
After opening environment variables, click on New (under System Variables) and give the variable name as
JAVA_HOME
and variable value asC:\Program Files\Java\jdk1.8.0_74.
(Do check in your system where jdk has been installed and provide that path):Then double click Path variable (under System Variables) and move towards the end of the text box - insert a semi colon if not inserted and add the location of the
bin
folder of JDK such as:%JAVA_HOME%\bin
. Then click on OK to all the windows opened.To validate whether Java is successfully installed, type the following command in command prompt:
java -version
The preceding screenshot signifies Java has installed successfully.
In this section, Elasticsearch v5.1.1 installation will be covered for Ubuntu and Windows separately.
In order to install Elasticsearch on Ubuntu, refer to the following steps:
Download Elasticsearch 5.1.1 as a debian package using terminal:
wget https://artifacts.elastic.co /downloads/elasticsearch/elasticsearch-5.1.1.deb
Install the debian package using following command:
sudo dpkg -i elasticsearch-5.1.1.deb
Configure Elasticsearch to run automatically on bootup . If you are using SysV init distribution, then run the following command:
sudo update-rc.d elasticsearch defaults 9510
The preceding command will print on screen:
Adding system startup for, /etc/init.d/elasticsearch
Check status of Elasticsearch using following command:
sudo service elasticsearch status
Run Elasticsearch as a service using following command:
sudo service elasticsearch start
Note
Elasticsearch may not start if you have any plugin installed which is not supported in ES-5.0.x version onwards. As plugins have been deprecated, it is required to uninstall any plugin if exists in prior version of ES. Remove a plugin after going to ES Home using following command:
bin/elasticsearch-plugin remove head
Usage of Elasticsearch command:
sudo service elasticsearch {start|stop|restart|force-reload|status}
If you are using systemd distribution, then run following command:
sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable elasticsearch.service
To verify elasticsearch installation open open
http://localhost:9200
in browser or run the following command from command line:curl -X GET http://localhost:9200
In order to install Elasticsearch on Windows, refer to the following steps:
Download Elasticsearch 5.1.1 version from its site using the following link:
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.zip
Upon opening the link, click on it and it will download the ZIP package.
Extract the downloaded ZIP package by unzipping it using WinRAR, 7-Zip, and other such extracting softwares (if you don't have one of these then download it).
This will extract the files and folders in the directory.
Then click on the extracted folder and navigate the folder to reach inside the
bin
folder.Click on the
elasticsearch.bat
file to run Elasticsearch.To verify Elasticsearch installation, open
http://localhost:9200
in the browser:
This section covers installation of Kibana 5.1.1 on Ubuntu and Windows separately, before running Kibana, there are some prerequisites:
Elasticsearch should be installed and running on port
9200
(default port).Make sure the port on which Kibana is running is not being used by any other application. By default, Kibana runs on port
5601
.
In order to install Kibana on Ubuntu, refer to the following steps:
Before installing Kibana, please check whether your system is 32 bit or 64 bit which can be done using the following command:
uname -m
If it gives an output as x86_64 it means it is 64-bit system else, if it gives i686 it means it is a 32-bit system.
Download Kibana 5.1.1 as a debian package using terminal:
For 64-bit system:
wget https://artifacts.elastic.co/ downloads/kibana/kibana-5.1.1-amd64.deb
For 32-bit system:
wget https://artifacts.elastic.co/ downloads/kibana/kibana-5.1.1-i386.deb
Install the debian package using following command:
For 64-bit system:
sudo dpkg -i kibana-5.1.1-amd64.deb
For 32-bit system:
sudo dpkg -i kibana-5.1.1-i386.deb
Configure Kibana to run automatically on bootup . If you are using SysV init distribution, then run the following command:
sudo update-rc.d kibana defaults 9510
The above command will print on screen:
Adding system startup for /etc/init.d/kibana
Check status of Kibana using following command:
sudo service kibana status
Run Kibana as a service using following command:
sudo service kibana start
Usage of Kibana command:
sudo service kibana {start|force-start|stop|force-stop|status|restart}
If you are using systemd distribution then run following command:
sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable kibana.service
To verify Kibana installation open
http://localhost:5601
in the browser:
In order to install Kibana on Windows, refer to the following steps:
Download Kibana version 5.1.1 from the Elastic website using the following link:
https://artifacts.elastic.co/downloads/kibana/kibana-5.1.1-windows-x86.zip
Upon opening the link, click on it and it will download the ZIP package.
Extract the downloaded ZIP package by either it using WinRAR, 7-Zip, or other such software.This will extract the files and folders in the directory.
Then click on the extracted folder and navigate the folder to reach inside the
bin
folder.Click on the
kibana.bat
file to run Kibana.To verify Kibana installation, open
http://localhost:5601
in the browser:
In this section, Logstash will be installed. Logstash 5.1.1 will be installed and this section covers installation on Ubuntu and Windows separately.
In order to install Logstash on Ubuntu, refer to the following steps:
Download Logstash 5.1.1 as a debian package using terminal:
wget https://artifacts.elastic.co /downloads/logstash/logstash-5.1.1.deb
Install the debian package using following command:
sudo dpkg -i logstash-5.1.1.deb
Check status of Logstash using following command:
sudo initcl status logstash
Run Logstash as a service using following command:
sudo initctl start logstash
In order to install Logstash on Windows, refer to the following steps:
Download Logstash 5.1.1 version from the Elastic site using the following link:
https://artifacts.elastic.co/downloads/logstash/logstash-5.1.1.zip
Upon opening the link click it to download the ZIP package.
Extract the downloaded ZIP package by unzipping it using WinRar, 7Zip and other such software.
This will extract the files and folders in the directory.
Then click on the extracted folder and navigate the folder to reach inside the bin folder.
To validate whether Logstash is successfully installed, type the following command into command prompt after navigating to the
bin
folder:logstash --version
This will print the Logstash version installed.
In this section, Filebeat will be installed. Filebeat 5.1.1 will be installed and this section covers installation on Ubuntu and Windows separately.
In order to install Filebeat on Ubuntu, refer to the following steps:
Before installing Filebeat, please check whether your system is 32 bit or 64 bit which can be done using the following command:
uname -m
If it gives an output as x86_64 it means it is 64-bit system else, if it gives i686 it means it is a 32-bit system.
Download Filebeat 5.1.1 as a debian package using terminal
For 64-bit system:
wget https://artifacts.elastic.co /downloads/beats/filebeat/filebeat-5.1.1-amd64.deb
For 32-bit system:
wget https://artifacts.elastic.co /downloads/beats/filebeat/filebeat-5.1.1-i386.deb
Install the debian package using following command:
For 64-bit system:
sudo dpkg -i filebeat-5.1.1-amd64.deb
For 32-bit system:
sudo dpkg -i filebeat-5.1.1-i386.deb
Configure Filebeat to run automatically on bootup. If you are using SysV init distribution, then run the following command:
sudo update-rc.d filebeat defaults 95 10
The above command will print on screen:
Adding system startup for /etc/init.d/filebeat.
Check status of Filebeat using following command:
sudo service filebeat status
Run Filebeat as a service using following command:
sudo service filebeat start
Usage of Filebeat command:
sudo service filebeat {start|stop|status|restart|force-reload}
In order to install Filebeat on Windows, refer to the following steps:
Before installing Filebeat, please check whether your system is 32 bit or 64 bit which can be done using the following command in command prompt:
wmic os get osarchitecture
It will give an output as 64-bit or 32-bit.
Download Filebeat 5.1.1 version from Elastic site using the following link:
Upon opening the link, click on it and it will download the ZIP package.
Extract the downloaded ZIP package by unzipping it using WinRAR, 7-Zip, or other such software:
This will extract the files and folders in the directory.
Open Windows PowerShell as an administrator (install if not present).
Navigate to the directory where Filebeat is extracted and stored (such as
C:\Users\username\Desktop
) and run the following command in Windows PowerShell:.\install-service-filebeat.ps1
Note
If script execution is disabled on your system, you need to set the execution policy for the current session to allow the script to run. For example:
PowerShell.exe -ExecutionPolicy UnRestricted -File .\install-service-filebeat.ps1.
This will install Filebeat as a Windows service.
Along with Elastic Stack, there are a few more aspects needed taken care of. These are sensitive points such as security, monitoring, alerts, and so on. X-Pack includes five such features:
Security
Alerts
Monitoring
Graphs
Reporting
Security, alerts, and monitoring were already there with different names: Shield, Watcher, and Marvel, respectively. Now graphs and reporting are also part of the team, and this team is named X-Pack. Just like tools in Elastic Stack, these will also be developed, built, tested, and released together with the same version.
This chapter is an introductory chapter for Elastic Stack and its components. We learned about how it progressed, what was changed, what was introduced, and how it became Elastic Stack from ELK stack. We got to know about a few of the case studies where these components helped organizations to meet their requirements.
Later in the chapter, we set up Elasticsearch, Logstash, and Kibana, along with Filebeat as a service. Finally, this chapter introduced X-Pack, which will be covered later in this book.
In the next chapter, we will learn about Elasticsearch in detail, APIs, QueryDSL, and so on.