Reader small image

You're reading from  Mastering Elastic Stack

Product typeBook
Published inFeb 2017
PublisherPackt
ISBN-139781786460011
Edition1st Edition
Right arrow
Authors (2):
Ravi Kumar Gupta
Ravi Kumar Gupta
author image
Ravi Kumar Gupta

Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development. He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools. He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in). He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing. He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Read more about Ravi Kumar Gupta

Yuvraj Gupta
Yuvraj Gupta
author image
Yuvraj Gupta

Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at gupta.yuvraj@gmail.com or at LinkedIn www.linkedin.com/in/guptayuvraj.
Read more about Yuvraj Gupta

View More author details
Right arrow

Chapter 2. Stepping into Elasticsearch

In the previous chapter, we learned the basics of Elasticsearch, Logstash, Kibana, and Beats, and how to install and configure them to set the pipeline. We came to know the role of Elasticsearch, and the way it worked with other components of the stack. This was just the tip of the iceberg. To get a better idea of how Elasticsearch works, we need to learn about the APIs, modules, and plugins it offers. These topics are divided in two chapters.

We're going to take a deep dive into Elasticsearch in this chapter. These are the topics that we are going to cover:

  • The beginning of Elasticsearch

  • Understanding the architecture

  • Elasticsearch APIs

  • Aggregation

  • A note for painless scripting

At the end of this chapter, you should have a good idea about how to use aggregations, and the power of APIs. There will be more about Elasticsearch, which will be covered in Chapter 8, Elasticsearch APIs.

The beginning of Elasticsearch


It all started with Lucene, a brilliant project supported by Apache Software Foundation. There is a good list of Lucene-based projects. To name a few - Apache Solr, Elasticsearch, Apache Nutch, Lucene.Net, DocFetcher, and many more. If you ever try to find a search engine kind of solution, you will surely come across Lucene. It's not only available for Java, but also for Delphi, Perl, C#, C++, Python, Ruby, and PHP. A complete list of Lucene implementation is available at http://wiki.apache.org/lucene-java/LuceneImplementations.

Lucene is a full text search engine and it creates indices on documents. In a paragraph or blob of text, every string is called a term and a sequence of terms is named as a field, and a sequence of fields is named a document. An index contains a sequence of documents and it indexes data as documents.

In books, we usually see an index where all the keywords are written and which helps us to find the actual content. This type of index is...

Understanding the architecture


To understand how Elasticsearch works, it's necessary that we learn about the architecture of it.

To understand how index, types, documents, and fields work together, let's refer to the following figure:

As seen in the preceding figure, an index contains one or multiple types. A type can be thought of as a table in a relational database. A type has one or more documents. There are one or more fields in the document. Fields are key value pairs.

A cluster has one or more nodes. Clusters are identified by their names. By default, elasticsearch is the name of the cluster. In case you have to set up multiple Elasticsearch instances, in the same network, you should keep different names or else all nodes will join the same cluster. Similar to clusters, a node also has a name. We can assign it a name and a cluster name to join. In case we don't provide a cluster name to join, then nodes will automatically search and join the cluster with the name elasticsearch.

If we...

Elasticsearch APIs


There are many APIs available for managing Elasticsearch. These APIs help us to manage cluster, indices, search, and so on. In this section, we will look at each of these APIs in detail.

We can use these APIs through Command Prompt, Console in Kibana, or any tool that can make calls to RESTful APIs.

Note

By default, Elasticsearch runs on port 9200 to listen to HTTP requests. Kibana uses the same port to connect to Elasticsearch. To learn more about Console, refer to Chapter 4, Kibana InterfaceExploring Dev tools section.

Sense is a powerful plugin for Kibana that allows us to make calls to Elasticsearch APIs using a web interface. We will be learning about Sense in Chapter 8, Elasticsearch APIs. For this chapter, we will be using cURL, a Command Prompt utility that allows us to access HTTP requests to access the APIs.

A typical cURL request against ES contains a verb, URL, and message body:

$ curl -X{Verb} 'url' -d '{message-body}'

Verbs are GET, PUT, POST, DELETE, and HEAD...

Query DSL


In this manner, we need to provide a request body with the uri just like we have been using for Document APIs. We can rewrite our author search query as follows:

$ curl -XGET 'http://localhost:9200/library/book/_search?pretty' -d '{
    "query" : {
      "term" : {"author" : "gupta"}
    }
  }'

This query will return the same result. Whatever query parameters we defined using q=, we define them in term. To learn more about Query DSL, refer to https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl.html.

Aggregations


This framework is a very important part of Elasticsearch. As the name suggests, this framework helps us to do aggregations and generate analytic information on result of a search query. Aggregations help us to get better insight of the data. For example, if we take our library index into account, we can get answers to: How many books in a specific year, which technology, average book per year, and many more.

These aggregations show their power when it comes to gaining insight of system data on a dashboard. Most often system dashboards have aggregated data in form of charts. We will also be using aggregations in later chapters and those aggregations will help Kibana to generate useful visualizations.

There are two types of core aggregations: metrics and buckets. We will learn about these in this section.

Bucket

These aggregations create buckets of documents based on a criterion. These types of aggregations can also hold sub-aggregations. We will learn about sub-aggregations in this...

A note for painless scripting


There are times when we use scripts, update data, scripted fields, and many more use cases. Prior to version 5.x, groovy was the default language for your scripts. We even did not specify which scripts we wanted to use back then. Since these scripts were executed remotely security was always a concern that Elastic Team had to address. This became the reason for designing Painless.

Painless is both secure and efficient when it comes to performance. It has a similar syntax as of Groovy so it is also easy to learn and use. For most of the cases, you don't need to make changes to your previously written scripts. All you need to add is a parameter called lang and specify the value as painless.

To define a variable in painless, simply use the following:

def myVar = 'my-value'; 

We don't need to specify any type. At runtime, the type of variable will be detected whatever suits appropriate. Painless supports all variable types defined by Java.

To define an array, use...

Summary


In this chapter, we learned about Elasticsearch architecture and the way Elasticsearch was born. Later we got familiar with Elasticsearch APIs - Search, Indices, and Document. With the help of these APIs we learned how to add documents to Elasticsearch, how to query those documents, managing the indices. At the end of the chapter, aggregations show how to effectively search documents. We will be practicing these concepts in the next chapters with more examples.

In the next chapter, we will learn about Logstash, and how to configure Logstash for complex data types, and Logstash plugins.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Elastic Stack
Published in: Feb 2017Publisher: PacktISBN-13: 9781786460011
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ravi Kumar Gupta

Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development. He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools. He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in). He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing. He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Read more about Ravi Kumar Gupta

author image
Yuvraj Gupta

Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at gupta.yuvraj@gmail.com or at LinkedIn www.linkedin.com/in/guptayuvraj.
Read more about Yuvraj Gupta