Packt+ | Advance your knowledge in tech

You're reading from Learning ELK Stack

Product typeBook

Published inNov 2015

Publisher

ISBN-139781785887154

Edition1st Edition

Tools

Elasticsearch Kibana

Concepts

Enterprise Search

Author (1)

Saurabh Chhajed

Chapter 2. Building Your First Data Pipeline with ELK

In the previous chapter, we got familiar with each component of ELK Stack—Elasticsearch, Logstash, and Kibana. We got the components installed and configured. In this chapter, we will build our first basic data pipeline using ELK Stack. This will help us understand how easy it is to get together the components of ELK Stack to build an end-to-end analytics pipeline.

While running the example in this chapter, we assume that you already installed Elasticsearch, Logstash, and Kibana as described in Chapter 1, Introduction to ELK Stack.

Input dataset

For our example, the dataset that we are going to use here is the daily Google (GOOG) Quotes price dataset over a 6 month period from July 1, 2014 to December 31, 2014. This is a good dataset to understand how we can quickly analyze simple datasets, such as these, with ELK.

Note

This dataset can be easily downloaded from the following source:

http://finance.yahoo.com/q/hp?s=GOOG

Data format for input dataset

The most significant fields of this dataset are Date, Open Price, Close Price, High Price, Volume, and Adjusted Price.

The following table shows some of the sample data from the dataset. The actual dataset is in the CSV format.

Configuring Logstash input

As we already know, Logstash has a rich set of plugins for different types of inputs, outputs and filters, which can read, parse, and filter data as per our needs. We will utilize the file input plugin to read the source file.

A file input plugin streams events from the input file, and each event is assumed as a single line. It automatically detects file rotation and handles it. It maintains the location where it left reading, and will automatically detect the new data if configured correctly. It reads files in a similar manner:

tail -0f

In general, a file input plugin configuration will look as follows:

input {
 
file {
    path => #String (path of the files) (required) 
    start_position => #String (optional, default "end")
    tags => #array (optional)
    type => #string (optional)
}

}

path: The path field is the only required field in file input plugin, which represents the path of the file from where input events have to be processed.
start_position...

Filtering and processing input

Once we configure the input file, we need to filter the input based on our needs so that we can identify which fields we need, and process them as per the required analysis.

A filter plugin will perform the intermediary processing on the input event. We can apply the filter conditionally based on certain fields.

Since our input file is a CSV file, we will use the csv filter for the same. The csv filter takes an event field that contains CSV formatted data, parses it, and stores it as individual fields. It can also parse data with any separator other than commas. A typical csv filter is as follows:

filter {  
    csv {
        columns => #Array of column names.
        separator => #String ; default -","
    }
}

The attribute columns take the name of fields in our CSV file, which is optional. By default, the columns will be named as column 1, column 2, and so on.

The attribute separator defines what character is used to separate the different columns in the...

Putting data to Elasticsearch

Now that we have set up the data to be consumed by a CSV file into Logstash, followed by parsing and processing based on the data type needed, we now need to put the data in Elasticsearch so that we can index the different fields and consume them later via the Kibana interface.

We will use the output plugin of Logstash for an elasticsearch output.

A typical elasticsearch plugin configuration looks like this:

output {

  elasticsearch {

    action =>  # string (optional), default: "index"

    cluster =>  # string (optional)

    host =>  # string (optional)
   
    document_id =>  # string (optional), default: nil


    index =>  # string (optional), default: "logstash-%{+YYYY.MM.dd}"
    index_type =>  # string (optional)
    port =>  # string (optional)
    protocol =>  # string, one of ["node", "transport", "http"] (optional)
  }
}

action: This specifies what action to perform on incoming documents. The default is "index" and possible...

Visualizing with Kibana

Now when you verify that your data is indexed successfully in Elasticsearch, we can go ahead and look at the Kibana interface to get some useful analytics from the data.

Running Kibana

As described in the previous chapter, we will start the Kibana service from the Kibana installation directory.

$ bin/kibana

Now, let's see Kibana up and running similar to the following screenshot on the browser, by going to the following URL:

http://localhost:5601

Kibana Discover page

As we already set up Kibana to take logstash-* indexes by default, it displays the indexed data as a histogram of counts, and the associated data as fields in the JSON format.

First of all, we need to set the date filter to filter based on our date range so that we can build our analysis on the same. Since we took data from July 1, 2014 to December 31, 2014, we will configure our date filter for the same.

Clicking on the Time Filter icon at the extreme top-right corner, we can set an Absolute Time Filter based...

Summary

In this chapter, we saw how you can utilize different input, filter, and output plugins in Logstash to gather, parse, and index data to Elasticsearch, and later utilize the Kibana interface to query and visualize over Elasticsearch indexes. We also built some visualizations, and a dashboard using those visualizations. We successfully built our first data pipeline using ELK Stack. In the coming chapters, we will look at individual components in more detail.

The rest of the chapter is locked

You have been reading a chapter from

Learning ELK Stack

Published in: Nov 2015Publisher: ISBN-13: 9781785887154

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Saurabh Chhajed

Saurabh Chhajed is a technologist with vast professional experience in building Enterprise applications that span across product and service industries. He has experience building some of the largest recommender engines using big data analytics and machine learning, and also enjoys acting as an evangelist for big data and NoSQL technologies. With his rich technical experience, Saurabh has helped some of the largest financial and industrial companies in USA build their large product suites and distributed applications from scratch. He shares his personal experiences with technology at http://saurzcode.in. Saurabh has also reviewed books by Packt Publishing, Apache Camel Essentials and Java EE 7 Development with NetBeans 8, in the past.
Read more about Saurabh Chhajed

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

Date	Open	High	Low	Close	Volume	Adj Close
Dec 31, 2014	531.25	532.60	525.80	526.40	1,368,200	526.40
Dec 30, 2014	528.09	531.15	527.13	530.42	876,300	530.42
Dec 29, 2014	532.19	535.48	530.01	530.33	2,278,500	530.33
Dec 26, 2014	528.77	534.25	527.31	534.03	1,036,000	534.03
Dec 24, 2014	530.51	531.76	527.02	528.77 ...