You're reading from Getting Started with Elastic Stack 8.0

Product type Book

Published in Mar 2022

Publisher Packt

ISBN-13 9781800569492

Pages 474 pages

Edition 1st Edition

Languages

Concepts

Enterprise Search

Author (1):

Asjad Athick

Table of Contents (18) Chapters

Preface

Section 1: Core Components

Chapter 1: Introduction to the Elastic Stack

Chapter 2: Installing and Running the Elastic Stack

Section 2: Working with the Elastic Stack

Chapter 3: Indexing and Searching for Data

Chapter 4: Leveraging Insights and Managing Data on Elasticsearch

Chapter 5: Running Machine Learning Jobs on Elasticsearch

Chapter 6: Collecting and Shipping Data with Beats

Chapter 7: Using Logstash to Extract, Transform, and Load Data

Chapter 8: Interacting with Your Data on Kibana

Chapter 9: Managing Data Onboarding with Elastic Agent

Section 3: Building Solutions with the Elastic Stack

Chapter 10: Building Search Experiences Using the Elastic Stack

Chapter 11: Observing Applications and Infrastructure Using the Elastic Stack

Chapter 12: Security Threat Detection and Response Using the Elastic Stack

Chapter 13: Architecting Workloads on the Elastic Stack

Other Books You May Enjoy

Chapter 3: Indexing and Searching for Data

Having successfully installed Elasticsearch (and other core components) on our operating system or platform of choice, this chapter will focus on diving deeper into Elasticsearch. As discussed in Chapter 2, Installing and Running the Elastic Stack, Elasticsearch is a distributed search engine and document store. With the ability to ingest and scale terabytes of data a day, Elasticsearch can be used to search, aggregate, and analyze any type of data source. It is incredibly easy to get up and running with a single-node Elasticsearch deployment.

This chapter will explore some of the advanced functionality that you will need to understand to design and scale for more complex requirements around ingesting, searching, and managing large volumes of data. Upon completing this chapter, you will understand how indices work, how data can be mapped to an appropriate data type, and how data can be queried on Elasticsearch.

Specifically, we will...

Technical requirements

The code examples for this chapter can be found in the GitHub repository for this book: https://github.com/PacktPublishing/Getting-Started-with-Elastic-Stack-8.0/tree/main/Chapter3.

Start an instance of Elasticsearch and Kibana on your local machine to follow along with the examples in this chapter. Alternatively, you can use your preferred mode of running the components from Chapter 2, Installing and Running the Elastic Stack.

Use Dev Tools on Kibana to make interacting with Elasticsearch REST APIs more convenient. Dev Tools takes care of authentication, content headers, hostnames, and more so that you can focus on crafting and running your API calls. The Dev Tools app can be found under the Management section in the Kibana navigation sidebar. The same REST API calls can be performed directly against your Elasticsearch cluster using a tool such as curl or Postman if you prefer:

Figure 3.1 – Kibana Dev Tools console

This...

Understanding the internals of an Elasticsearch index

When users want to store data (or documents) on Elasticsearch, they do so in an index. An index on Elasticsearch is a location to store and organize related documents. They don't all have to be the same type of data, but they generally have to be related to one another. In the SQL world, an index would be comparable to a database containing multiple tables (where each table is designed for a single type of data).

An index is made up of primary shards. Primary shards can be replicated into replica shards to achieve high availability. Each shard is an instance of a Lucene index with the ability to handle indexing and search requests. The primary shard can handle both read and write requests, while replica shards are read-only. When a document is indexed into Elasticsearch, it is indexed by the primary shard before being replicated to the replica shard. The indexing request is only acknowledged once the replica shard has been...

Elasticsearch nodes

An Elasticsearch node is a single running instance of Elasticsearch. A single physical or virtual machine can run multiple instances or nodes of Elasticsearch, assuming it has sufficient resources to do so.

Elasticsearch nodes perform a variety of roles within the cluster. The roles that a node performs can be granularly controlled as required.

We will cover some common node roles in the following sections.

Master-eligible nodes

Master-eligible nodes take part in the master election process. At any point in time, a single node is elected to be the active master. The active master node keeps track of other nodes in the cluster, creation or deletion of indices, shards being allocated to nodes based on requirements/constraints, cluster settings being applied, and more.

The master role is generally not very resource-intensive and can be co-located on a node running other roles in smaller clusters. Running the master role on a dedicated host makes sense...

Searching for data

Now that we understand some of the core aspects of Elasticsearch (shards, indices, index mappings/settings, nodes, and more), let's put it all together by ingesting a sample dataset and searching for data.

Indexing sample logs

Follow these steps to ingest some Apache web access logs into Elasticsearch:

Navigate to the Chapter3/searching-for-data directory in the code repository for this book. Inspect the web.log file to see the raw data that we are going to load into Elasticsearch for querying:
```
head web.log
```
A Bash script called load.sh has been provided for loading two items into your Elasticsearch cluster:

(a) An index template called web-logs-template that defines the index mappings and settings that are compliant with the Elastic Common Schema:

cat web-logs-template.json

(b) An ingest pipeline called web-logs-pipeline that parses and transforms logs from your dataset into the Elastic Common Schema:

cat web-logs-pipeline.json...

Summary

In this chapter, we briefly looked at three core aspects of Elasticsearch.

First, we looked at the internals of an index in Elasticsearch. We explored how settings can be applied to indices and learned how to configure mappings for document fields. We also looked at a range of different data types that are supported and how they can be leveraged for various use cases.

We then looked at how nodes on Elasticsearch host indices and data. We understood the different roles a node plays as part of a cluster, as well as the concept of data tiers, to take advantage of different hardware profiles on nodes, depending on how the data is used.

Lastly, we ingested some sample data and learned how to ask questions about our data using the search API.

In the next chapter, we will dive a little bit deeper into how to derive statistical insights, use ingest pipelines to transform data, create entity-centric indices by pivoting on incoming data, manage time series sources using...