You're reading from Advanced Elasticsearch 7.0

Product typeBook

Published inAug 2019

Reading LevelBeginner

PublisherPackt

ISBN-139781789957754

Edition1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Wai Tak Wong

Preface

Building enterprise-grade distributed applications and executing systematic search operations calls for a strong understanding of Elasticsearch and expertise in using its core APIs and latest features. This book will help you master the advanced functionalities of Elasticsearch and learn how to develop a sophisticated real-time search engine confidently. In addition to this, you'll also learn how to run machine learning jobs in Elasticsearch to speed up routine tasks.

You'll get started by learning how to use Elasticsearch features on Hadoop and Spark and make search results faster, thereby improving the speed of queries and enhancing customer experience. You'll then get up to speed with analytics by building a metrics pipeline, defining queries, and using Kibana for intuitive visualizations that help provide decision makers with better insights. The book will later guide you through using Logstash to collect, parse, and enrich logs before indexing them into Elasticsearch.

By the end of this book, you will have comprehensive knowledge of advanced topics such as Apache Spark support, machine learning using Elasticsearch and scikit-learn, and real-time analytics, along with the expertise you need to increase business productivity, perform analytics, and get the very best out of Elasticsearch.

You will do the following:

Pre-process documents before indexing in ingest pipelines
Learn how to model your data in the real world
Get to grips with using Elasticsearch for exploratory data analysis
Understand how to build analytics and RESTful services
Use Kibana, Logstash, and Beats for dashboard applications
Get up to speed with Spark and Elasticsearch for real-time analytics
Explore the Java high/low-level REST client and learn how to index, search, and query in a Spring application

Who this book is for

The book is aimed at beginners with no prior experience with Elasticsearch, and gradually introduces intermediate and advanced topics. The chapters walk through the most important aspects to help audiences to build and master the powerful search engine. Search engine data engineers, software engineers, and database engineers who want to take their basic knowledge of Elasticsearch to the next level can use it to its optimum level in their daily core tasks.

What this book covers

Chapter 1, Overview of Elasticsearch 7, takes beginners through some basic features in minutes. We just take a few steps to launch the new version of the Elasticsearch server. An architectural overview and a core concept introduction will make it easy to understand the workflow in Elasticsearch.

Chapter 2, Index APIs, discusses how to use index APIs to manage individual indices, index settings, aliases, and templates. It also involves monitoring statistics for operations that occur on an index. Index management operations including refreshing, flushing, and clearing the cache are also discussed.

Chapter 3, Document APIs, begins with the basic information about a document and its life cycle. Then we learn how to access it. After that, we look at accessing multiple documents with the bulk API. Finally, we discuss migrating indices from the old version to version 7.0.

Chapter 4, Mapping APIs, introduces the schema in Elasticsearch. The mapping rules for both dynamic mappings and explicit static mappings will be discussed. It also provides the idea and details of creating static mapping for an index. We also step into the details of the meta fields and field data types in index mapping.

Chapter 5, Anatomy of an Analyzer, drills down in to the anatomy of the analyzer and in-depth practice different analyzers. We will discuss different character filters, tokenizers, and token filters in order to understand the building blocks of the analyzer. We also practice how to create a custom analyzer and use it in the analyze API.

Chapter 6, Search APIs, covers different types of searches, from terms-based to full-text, from exact search to fuzzy search, from single-field search to multi-search, and then to compound search. Additional information about Query DSL and search-related APIs such as tuning, validating, and troubleshooting will be discussed.

Chapter 7, Modeling Your Data in the Real World, discusses data modeling with Elasticsearch. It focuses on some common issues users may encounter when working with different techniques. It helps you understand some of the conventions and contains insights from real-world examples involving denormalizing complex objects and using nested objects to handle relationships.

Chapter 8, Aggregation Framework, discusses data analytics using the aggregation framework. We learn how to perform aggregations with examples and delve into most of the types of aggregations. We also use IEX ETF historical data to plot a graph for different types of moving averages, including forecasted data supported by the model.

Chapter 9, Preprocessing Documents in Ingest Pipelines, discusses the preprocessing of a document through predefined pipeline processors before the actual indexing operation begins. We also learn about data accessing to documents through the pipeline processor. Finally, we cover exception handling when an error occurs during pipeline processing.

Chapter 10, Using Elasticsearch for Exploratory Data Analysis, uses the aggregation framework to perform data analysis. We first discuss a comprehensive analysis of exploratory data and simple financial analysis of business strategies. In addition, we provide step-by-step instructions for calculating Bollinger Bands using daily operational data. Finally, we will conduct a brief survey of sentiment analysis using Elasticsearch.

Chapter 11, Elasticsearch from Java Programming, focuses on the basics of two supported Java REST clients. We explore the main features and operations of each approach. A sample project is provided to demonstrate the high-level and low-level REST clients integrated with Spring Boot programming.

Chapter 12, Elasticsearch from Python Programming, introduces the Python Elasticsearch client. We learn about two Elasticsearch client packages, elasticsearch-py and elasticsearch-dsl-py. We learn how the clients work and incorporate them into a Python application. We implement Bollinger Bands by using elasticsearch-dsl-py.

Chapter 13, Using Kibana, Logstash, and Beats, outlines the components of the Elastic Stack, including Kibana, Logstash, and Beats. We learn how to use Logstash to collect and parse log data from system log files. In addition, we use Filebeat to extend the use of Logstash to a central log processing center. All work will be run on official supported Elastic Stack Docker images.

Chapter 14, Working with Elasticsearch SQL, introduces Elasticsearch SQL. With Elasticsearch SQL, we can access full-text search using familiar SQL syntax. We can even obtain results in tabular view format. We perform search and aggregation using different approaches, such as using the SQL REST API interface, the command-line interface, and JDBC.

Chapter 15, Working with Elasticsearch Analysis Plugins, introduces built-in Analysis plugins. We practice using the ICU Analysis plugin, the Smart Chinese Analysis plugin, and the IK Analysis plugin to analyze Chinese texts. We also add a new custom dictionary to improve word segmentation to make it generate better results.

Chapter 16, Machine Learning with Elasticsearch, discusses the machine learning feature supported by Elasticsearch. This feature automatically analyzes time series data by running a metric job. This type of job contains one or more detectors (the analyzed fields). We also introduce the Python scikit-learn library and the unsupervised learning algorithm K-means clustering and use it for comparison.

Chapter 17, Spark and Elasticsearch for Real-Time Analytics, focuses on ES-Hadoop's Apache Spark support. We practice reading data from the Elasticsearch index, performing some computations using Spark, and then writing the results back to Elasticsearch through ES-Hadoop. We build a real-time anomaly detection routine based on the K-means model created from past data by using the Spark ML library.

Chapter 18, Building Analytics RESTful Services, explains how to construct a project providing a search analytics REST service powered by Elasticsearch. We combine lots of material and source code from different chapters to build a real-world end-to-end project and present the result on a Kibana Visualize page.

To get the most out of this book

Readers should have a basic knowledge of Linux, Java, Python, Virtualenv, SQL, Spark, and Docker.

All installation steps are described in detail in each relevant chapter.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Elasticsearch-7.0. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789957754_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {
 height: 100%; 
 margin: 0;
 padding: 0
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

The rest of the chapter is locked

You have been reading a chapter from

Advanced Elasticsearch 7.0

Published in: Aug 2019Publisher: PacktISBN-13: 9781789957754

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong

Other recommended products

Related to this chapter

Elasticsearch 7 Quick Start Guide

Elasticsearch is one of the most popular tools for distributed search. This book will help you in understanding all about the new features of Elasticsearch 7, and how to use them efficiently for searching, aggregating and indexing data with speed and accuracy.

BookOct 2019186 pages

Learning Elasticsearch

Elasticsearch is a Lucene-based search and analytics engine for distributed search and analytics. This book will be your hands-on guide as you explore and put to use the features of Elasticsearch 5.x.

BookJun 2017404 pages

Mastering Elasticsearch 5.x

This book will help you leverage Elasticsearch, guiding you through everything from writing and creating customized plugins to extend Elasticsearch to tackling challenges while handling relational data in Elasticsearch. You’ll learn with the help of practical examples in a step-by-step way.

BookFeb 2017428 pages

Elasticsearch 7.0 Cookbook

This book is your one-stop guide to master Elasticsearch. It provides numerous problem-solution based recipes through which you can implement Elasticsearch in your enterprise applications in a very simple, hassle-free way.

BookApr 2019724 pages

Elasticsearch 5.x Cookbook

BookFeb 2017696 pages

Mastering Kibana 6.x

Mastering Kibana 6.x provides a rundown explanation required for data visualization and analysis such as X-Pack features, Beats, and machine learning. You will be expert in creating analytics-driven visualizations from a web application. You will be a maestro in creating custom monitoring dashboard using Beats with various examples

BookJul 2018376 pages

Learning Elastic Stack 7.0

This book teaches you about every component of the Elastic Stack - including Elasticsearch, Kibana, Logstash, and X-pack - with new and the updated features that are released with the 7.0 version. With the help of this book, you will be able to develop enterprise-grade distributed search and analytics applications for your data without any hassle.

BookMay 2019474 pages

Learning Elastic Stack 6.0

This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications. It provide in-depth coverage of the different components of the Elastic Stack, and how to use them all together.

BookDec 2017434 pages

Learning Kibana 7

This book will introduce you to Kibana 7, and will show you how it fits into the Elastic stack. You will build a pure metric analytics architecture and visualize it using Timelion. You will also learn how to build relationships between documents using Graph visualization. You will also learn to build powerful Elastic dashboards using Kibana.

BookJul 2019280 pages

Mastering Elastic Stack

BookFeb 2017526 pages

Machine Learning with the Elastic Stack

Machine Learning with the Elastic Stack, Second Edition, provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for supervised learning and unsupervised learning that helps make machine learning truly operational for you.

BookMay 2021450 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages