Reader small image

You're reading from  Learning Elastic Stack 6.0

Product typeBook
Published inDec 2017
PublisherPackt
ISBN-139781787281868
Edition1st Edition
Right arrow
Authors (2):
Pranav Shukla
Pranav Shukla
author image
Pranav Shukla

Pranav Shukla is the founder and CEO of Valens DataLabs, a technologist, husband, and father of two. He is a big data architect and software craftsman who uses JVM-based languages. Pranav has diverse experience of over 14 years in architecting enterprise applications for Fortune 500 companies and start-ups. His core expertise lies in building JVM-based, scalable, reactive, and data-driven applications using Java/Scala, the Hadoop ecosystem, Apache Spark, and NoSQL databases. He is a big data engineering, analytics, and machine learning enthusiast.
Read more about Pranav Shukla

Sharath Kumar M N
Sharath Kumar M N
author image
Sharath Kumar M N

Sharath Kumar M N did his master's in computer science at the University of Texas, Dallas, USA. He is currently working as a senior principal architect at Broadcom. Prior to this, he was working as an Elasticsearch solutions architect at Oracle. He has given several tech talks at conferences such as Oracle Code events. Sharath is a certified trainer Elastic Certified Instructor one of the few technology experts in the world who has been certified by Elastic Inc. to deliver their official from the creators of Elastic training. He is also a data science and machine learning enthusiast. In his free time, he likes playing with his lovely niece, Monisha; nephew, Chirayu; and his pet, Milo.
Read more about Sharath Kumar M N

View More author details
Right arrow

Chapter 10. Building a Sensor Data Analytics Application

In the previous chapter, we saw how you can take an Elastic Stack application to production. Armed with all the knowledge of Elastic Stack and the techniques for taking applications to production, we are ready to apply these concepts in a real-world application. In this chapter, we will build one such application using Elastic Stack that can handle a large amount of data applying the techniques that we have learnt so far.

In this chapter, we will cover the following topics as we build a sensor data analytics application:

  • Introduction to the application
  • Modeling data in Elasticsearch
  • Setting up the metadata database
  • Building the Logstash data pipeline
  • Sending data to Logstash over HTTP
  • Visualizing the data in Kibana

Let's go through the topics.

Introduction to the application


IoT (Internet of things) has found a wide range of applications in modern times. IoT can be defined as follows:

The Internet of things (IoT) is the collective web of connected smart devices that can sense and communicate with each other by exchanging data via the Internet.

IoT devices are connected to the Internet; they sense and communicate. They are equipped with different types of sensors that collect the data they observe and transmit it over the Internet. This data can be stored, analyzed, and often acted upon in near-real time. The number of such connected devices is projected to rise rapidly; according to Wikipedia, there will be an estimated 30 billion connected devices by 2020. Since each device can capture the current value of a metric and transmit it over the Internet, this can result in massive amounts of data.

A plethora of types of sensors have emerged in recent times for temperature, humidity, light, motion, and airflow; these can be used in different...

Modeling data in Elasticsearch


We have seen the structure of the final record after enriching the data. That should help us model the data in Elasticsearch. Given that our data is time series data, we can apply some of the techniques mentioned in Chapter 9Running Elastic Stack in Production, to model the data:

  • Defining an index template
  • Understanding the mapping

Let us look at the index template that we will define.

Defining an index template

Since we are going to be storing time series data that is immutable, we do not want to create one big monolithic index. We'll use the techniques discussed in the section Modeling time series data in Chapter 9, Running Elastic Stack in Production.

The source code of the application in this chapter is within the GitHub repository at https://github.com/pranav-shukla/learningelasticstack/tree/master/chapter-10. As we go through the chapter, we will perform the steps mentioned in the README.md file located at that path.

Please create the index template mentioned...

Setting up the metadata database


We need to have a database that has metadata about the sensors. This database will hold the tables that we discussed in the Introduction to the application section.

We are storing the data in a relational database MySQL, but you can use any other relational database equally well. Since we are using MySQL, we will be using the MySQL JDBC driver to connect to the database. Please ensure that you have following things set up on your system:

  1. MySQL database community version 5.5, 5.6, or 5.7. You can use an existing database if you already have it on your system.
  2. Install the downloaded MySQL database and log in with the root user. Execute the script at this path: https://github.com/pranav-shukla/learningelasticstack/tree/master/chapter-10/files/create_sensor_metadata.sql.
  3. Log in to the newly created sensor_metadata database and verify that the three tables—sensor_type, locations, and sensors—exist in the database.

You can verify that the database was created and populated...

Building the Logstash data pipeline


Having set up the mechanism to automatically create the Elasticsearch index and also the metadata database, we can now focus on building the data pipeline using Logstash. What should our data pipeline do? It should perform the following steps:

  • Accept JSON requests over the web (over HTTP)
  • Enrich the JSON with the metadata we have in the MySQL database
  • Store the resulting documents in Elasticsearch

These three main functions that we want to perform correspond exactly to the Logstash data pipeline's input, filter, and output plugins respectively. The full Logstash configuration file for this data pipeline is in the code base at https://github.com/pranav-shukla/learningelasticstack/tree/master/chapter-10/files/logstash_sensor_data_http.conf.

Let us look at how to achieve the end goal of our data pipeline by following the aforementioned steps. We will start with accepting JSON requests over the web (over HTTP).

Accept JSON requests over the web

This function is achieved...

Sending data to Logstash over HTTP


At this point, sensors can start sending their readings to the Logstash data pipeline that we have created in the previous section. They just need to send data as follows:

curl -XPOST -u sensor_data:sensor_data --header "Content-Type: application/json" "http://localhost:8080/" -d '{"sensor_id":1,"time":1512102540000,"reading":16.24}'

Since we don't have real sensors, we will simulate the data by sending these types of requests. The simulated data and script that sends this data are incorporated in the code at https://github.com/pranav-shukla/learningelasticstack/tree/master/chapter-10/data.

If you are on Linux or macOS, open the terminal and change the directory to your Learning Elasticstack workspace that was checked out from GitHub.

Note

If your machine has a Windows operating system, you will need a Linux-like shell that supports the curl command and basic BASH (Bourne Again SHell) commands. As you may already have a GitHub workspace checked out, you may...

Visualizing the data in Kibana


We have successfully set up the Logstash data pipeline and also loaded some data using the pipeline into Elasticsearch. It is time to explore the data and build a dashboard that will help us gain some insights into the data.

Let's start by doing a sanity check to see if the data is loaded correctly. We can do so by going to Kibana Dev Tools and executing the following query:

GET /sensor_data-*/_search?size=0
{
  "query": {"match_all": {}}
}

This query will search data across all indices matching the sensor_data-* pattern. There should be a good number of records in the index if the data was indexed correctly.

We will cover the following topics:

  • Set up an index pattern in Kibana
  • Build visualizations
  • Create a dashboard using the visualizations

Let us go through each step.

Set up an index pattern in Kibana

Before we can start building visualizations, we need to set up the index pattern for all indexes that we will potentially have for the Sensor Data Analytics application...

Summary


In this chapter, we built a sensor data analytics application that has a wide variety of applications, as it is related to the emerging IoT field. We understood the problem domain and the data model, including metadata related to sensors. We wanted to build an analytics application using only Elastic Stack components, without using any other tools and programming languages, to get a powerful tool that can handle large volumes of data.

We started at the very core by designing the data model for Elasticsearch. Then we designed a data pipeline that is secured and can accept data over the internet using HTTP. We enriched the incoming data using the metadata that we had in a relational database and stored in Elasticsearch. We sent some test data over HTTP just like real sensors send over the internet. We built some meaningful visualizations that will give answers to some typical questions. Then we put together all visualizations in a powerful, interactive dashboard.

In Chapter 11, Monitoring...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Elastic Stack 6.0
Published in: Dec 2017Publisher: PacktISBN-13: 9781787281868
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Pranav Shukla

Pranav Shukla is the founder and CEO of Valens DataLabs, a technologist, husband, and father of two. He is a big data architect and software craftsman who uses JVM-based languages. Pranav has diverse experience of over 14 years in architecting enterprise applications for Fortune 500 companies and start-ups. His core expertise lies in building JVM-based, scalable, reactive, and data-driven applications using Java/Scala, the Hadoop ecosystem, Apache Spark, and NoSQL databases. He is a big data engineering, analytics, and machine learning enthusiast.
Read more about Pranav Shukla

author image
Sharath Kumar M N

Sharath Kumar M N did his master's in computer science at the University of Texas, Dallas, USA. He is currently working as a senior principal architect at Broadcom. Prior to this, he was working as an Elasticsearch solutions architect at Oracle. He has given several tech talks at conferences such as Oracle Code events. Sharath is a certified trainer Elastic Certified Instructor one of the few technology experts in the world who has been certified by Elastic Inc. to deliver their official from the creators of Elastic training. He is also a data science and machine learning enthusiast. In his free time, he likes playing with his lovely niece, Monisha; nephew, Chirayu; and his pet, Milo.
Read more about Sharath Kumar M N