Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Getting Started with Elastic Stack 8.0

You're reading from  Getting Started with Elastic Stack 8.0

Product type Book
Published in Mar 2022
Publisher Packt
ISBN-13 9781800569492
Pages 474 pages
Edition 1st Edition
Languages
Author (1):
Asjad Athick Asjad Athick
Profile icon Asjad Athick

Table of Contents (18) Chapters

Preface Section 1: Core Components
Chapter 1: Introduction to the Elastic Stack Chapter 2: Installing and Running the Elastic Stack Section 2: Working with the Elastic Stack
Chapter 3: Indexing and Searching for Data Chapter 4: Leveraging Insights and Managing Data on Elasticsearch Chapter 5: Running Machine Learning Jobs on Elasticsearch Chapter 6: Collecting and Shipping Data with Beats Chapter 7: Using Logstash to Extract, Transform, and Load Data Chapter 8: Interacting with Your Data on Kibana Chapter 9: Managing Data Onboarding with Elastic Agent Section 3: Building Solutions with the Elastic Stack
Chapter 10: Building Search Experiences Using the Elastic Stack Chapter 11: Observing Applications and Infrastructure Using the Elastic Stack Chapter 12: Security Threat Detection and Response Using the Elastic Stack Chapter 13: Architecting Workloads on the Elastic Stack Other Books You May Enjoy

Chapter 3: Indexing and Searching for Data

Having successfully installed Elasticsearch (and other core components) on our operating system or platform of choice, this chapter will focus on diving deeper into Elasticsearch. As discussed in Chapter 2, Installing and Running the Elastic Stack, Elasticsearch is a distributed search engine and document store. With the ability to ingest and scale terabytes of data a day, Elasticsearch can be used to search, aggregate, and analyze any type of data source. It is incredibly easy to get up and running with a single-node Elasticsearch deployment.

This chapter will explore some of the advanced functionality that you will need to understand to design and scale for more complex requirements around ingesting, searching, and managing large volumes of data. Upon completing this chapter, you will understand how indices work, how data can be mapped to an appropriate data type, and how data can be queried on Elasticsearch.

Specifically, we will...

Technical requirements

The code examples for this chapter can be found in the GitHub repository for this book: https://github.com/PacktPublishing/Getting-Started-with-Elastic-Stack-8.0/tree/main/Chapter3.

Start an instance of Elasticsearch and Kibana on your local machine to follow along with the examples in this chapter. Alternatively, you can use your preferred mode of running the components from Chapter 2, Installing and Running the Elastic Stack.

Use Dev Tools on Kibana to make interacting with Elasticsearch REST APIs more convenient. Dev Tools takes care of authentication, content headers, hostnames, and more so that you can focus on crafting and running your API calls. The Dev Tools app can be found under the Management section in the Kibana navigation sidebar. The same REST API calls can be performed directly against your Elasticsearch cluster using a tool such as curl or Postman if you prefer:

Figure 3.1 – Kibana Dev Tools console

This...

Understanding the internals of an Elasticsearch index

When users want to store data (or documents) on Elasticsearch, they do so in an index. An index on Elasticsearch is a location to store and organize related documents. They don't all have to be the same type of data, but they generally have to be related to one another. In the SQL world, an index would be comparable to a database containing multiple tables (where each table is designed for a single type of data).

An index is made up of primary shards. Primary shards can be replicated into replica shards to achieve high availability. Each shard is an instance of a Lucene index with the ability to handle indexing and search requests. The primary shard can handle both read and write requests, while replica shards are read-only. When a document is indexed into Elasticsearch, it is indexed by the primary shard before being replicated to the replica shard. The indexing request is only acknowledged once the replica shard has been...

Elasticsearch nodes

An Elasticsearch node is a single running instance of Elasticsearch. A single physical or virtual machine can run multiple instances or nodes of Elasticsearch, assuming it has sufficient resources to do so.

Elasticsearch nodes perform a variety of roles within the cluster. The roles that a node performs can be granularly controlled as required.

We will cover some common node roles in the following sections.

Master-eligible nodes

Master-eligible nodes take part in the master election process. At any point in time, a single node is elected to be the active master. The active master node keeps track of other nodes in the cluster, creation or deletion of indices, shards being allocated to nodes based on requirements/constraints, cluster settings being applied, and more.

The master role is generally not very resource-intensive and can be co-located on a node running other roles in smaller clusters. Running the master role on a dedicated host makes sense...

Searching for data

Now that we understand some of the core aspects of Elasticsearch (shards, indices, index mappings/settings, nodes, and more), let's put it all together by ingesting a sample dataset and searching for data.

Indexing sample logs

Follow these steps to ingest some Apache web access logs into Elasticsearch:

  1. Navigate to the Chapter3/searching-for-data directory in the code repository for this book. Inspect the web.log file to see the raw data that we are going to load into Elasticsearch for querying:
    head web.log
  2. A Bash script called load.sh has been provided for loading two items into your Elasticsearch cluster:

(a) An index template called web-logs-template that defines the index mappings and settings that are compliant with the Elastic Common Schema:

cat web-logs-template.json

(b) An ingest pipeline called web-logs-pipeline that parses and transforms logs from your dataset into the Elastic Common Schema:

cat web-logs-pipeline.json...

Summary

In this chapter, we briefly looked at three core aspects of Elasticsearch.

First, we looked at the internals of an index in Elasticsearch. We explored how settings can be applied to indices and learned how to configure mappings for document fields. We also looked at a range of different data types that are supported and how they can be leveraged for various use cases.

We then looked at how nodes on Elasticsearch host indices and data. We understood the different roles a node plays as part of a cluster, as well as the concept of data tiers, to take advantage of different hardware profiles on nodes, depending on how the data is used.

Lastly, we ingested some sample data and learned how to ask questions about our data using the search API.

In the next chapter, we will dive a little bit deeper into how to derive statistical insights, use ingest pipelines to transform data, create entity-centric indices by pivoting on incoming data, manage time series sources using...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Getting Started with Elastic Stack 8.0
Published in: Mar 2022 Publisher: Packt ISBN-13: 9781800569492
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}