Elasticsearch Server: Second Edition


Elasticsearch Server: Second Edition
eBook: $29.99
Formats: PDF, PacktLib, ePub and Mobi formats
$25.49
save 15%!
Print + free eBook + free PacktLib access to the book: $79.98    Print cover: $49.99
$49.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • Learn about the fascinating functionalities of ElasticSearch like data indexing, data analysis, and dynamic mapping
  • Fine-tune ElasticSearch and understand its metrics using its API and available tools, and see how it behaves in complex searches
  • A hands-on tutorial that walks you through all the features of ElasticSearch in an easy-to-understand way, with examples that will help you become an expert in no time

Book Details

Language : English
Paperback : 428 pages [ 235mm x 191mm ]
Release Date : April 2014
ISBN : 1783980524
ISBN 13 : 9781783980529
Author(s) : Marek Rogoziński, Rafał Kuć
Topics and Technologies : All Books, Networking and Servers, Open Source


Table of Contents

Preface
Chapter 1: Getting Started with the Elasticsearch Cluster
Chapter 2: Indexing Your Data
Chapter 3: Searching Your Data
Chapter 4: Extending Your Index Structure
Chapter 5: Make Your Search Better
Chapter 6: Beyond Full-text Searching
Chapter 7: Elasticsearch Cluster in Detail
Chapter 8: Administrating Your Cluster
Index
  • Chapter 1: Getting Started with the Elasticsearch Cluster
    • Full-text searching
      • The Lucene glossary and architecture
      • Input data analysis
        • Indexing and querying
      • Scoring and query relevance
    • The basics of Elasticsearch
      • Key concepts of data architecture
        • Index
        • Document
        • Document type
        • Mapping
      • Key concepts of Elasticsearch
        • Node and cluster
        • Shard
        • Replica
        • Gateway
      • Indexing and searching
    • Installing and configuring your cluster
      • Installing Java
      • Installing Elasticsearch
      • Installing Elasticsearch from binary packages on Linux
        • Installing Elasticsearch using the RPM package
        • Installing Elasticsearch using the DEB package
      • The directory layout
      • Configuring Elasticsearch
      • Running Elasticsearch
      • Shutting down Elasticsearch
      • Running Elasticsearch as a system service
        • Elasticsearch as a system service on Linux
        • Elasticsearch as a system service on Windows
    • Manipulating data with the REST API
      • Understanding the Elasticsearch RESTful API
      • Storing data in Elasticsearch
      • Creating a new document
        • Automatic identifier creation
      • Retrieving documents
      • Updating documents
      • Deleting documents
      • Versioning
        • An example of versioning
        • Using the version provided by an external system
    • Searching with the URI request query
      • Sample data
      • The URI request
        • The Elasticsearch query response
        • Query analysis
        • URI query string parameters
      • The Lucene query syntax
    • Summary
  • Chapter 2: Indexing Your Data
    • Elasticsearch indexing
      • Shards and replicas
      • Creating indices
        • Altering automatic index creation
        • Settings for a newly created index
    • Mappings configuration
      • Type determining mechanism
        • Disabling field type guessing
      • Index structure mapping
        • Type definition
        • Fields
        • Core types
        • Multifields
        • The IP address type
        • The token_count type
        • Using analyzers
      • Different similarity models
        • Setting per-field similarity
        • Available similarity models
      • The postings format
        • Configuring the postings format
      • Doc values
        • Configuring the doc values
        • Doc values formats
    • Batch indexing to speed up your indexing process
      • Preparing data for bulk indexing
      • Indexing the data
      • Even quicker bulk requests
    • Extending your index structure with additional internal information
      • Identifier fields
      • The _type field
      • The _all field
      • The _source field
        • Exclusion and inclusion
      • The _index field
      • The _size field
      • The _timestamp field
      • The _ttl field
    • Introduction to segment merging
      • Segment merging
      • The need for segment merging
      • The merge policy
      • The merge scheduler
      • The merge factor
      • Throttling
    • Introduction to routing
      • Default indexing
      • Default searching
      • Routing
      • The routing parameters
      • Routing fields
    • Summary
  • Chapter 3: Searching Your Data
    • Querying Elasticsearch
      • The example data
      • A simple query
      • Paging and result size
      • Returning the version value
      • Limiting the score
      • Choosing the fields that we want to return
        • The partial fields
      • Using the script fields
        • Passing parameters to the script fields
    • Understanding the querying process
      • Query logic
      • Search types
      • Search execution preferences
      • The Search shards API
    • Basic queries
      • The term query
      • The terms query
      • The match_all query
      • The common terms query
      • The match query
        • The Boolean match query
        • The match_phrase query
        • The match_phrase_prefix query
      • The multi_match query
      • The query_string query
        • Running the query_string query against multiple fields
      • The simple_query_string query
      • The identifiers query
      • The prefix query
      • The fuzzy_like_this query
      • The fuzzy_like_this_field query
      • The fuzzy query
      • The wildcard query
      • The more_like_this query
      • The more_like_this_field query
      • The range query
      • The dismax query
      • The regular expression query
    • Compound queries
      • The bool query
      • The boosting query
      • The constant_score query
      • The indices query
    • Filtering your results
      • Using filters
      • Filter types
        • The range filter
        • The exists filter
        • The missing filter
        • The script filter
        • The type filter
        • The limit filter
        • The identifiers filter
        • If this is not enough
        • Combining filters
        • Named filters
      • Caching filters
    • Highlighting
      • Getting started with highlighting
      • Field configuration
      • Under the hood
      • Configuring HTML tags
      • Controlling the highlighted fragments
      • Global and local settings
      • Require matching
      • The postings highlighter
    • Validating your queries
      • Using the validate API
    • Sorting data
      • Default sorting
      • Selecting fields used for sorting
      • Specifying the behavior for missing fields
      • Dynamic criteria
      • Collation and national characters
    • Query rewrite
      • An example of the rewrite process
      • Query rewrite properties
    • Summary
  • Chapter 4: Extending Your Index Structure
    • Indexing tree-like structures
      • Data structure
      • Analysis
    • Indexing data that is not flat
      • Data
      • Objects
      • Arrays
      • Mappings
        • Final mappings
      • Sending the mappings to Elasticsearch
      • To be or not to be dynamic
    • Using nested objects
      • Scoring and nested queries
    • Using the parent-child relationship
      • Index structure and data indexing
        • Parent mappings
        • Child mappings
        • The parent document
        • The child documents
      • Querying
        • Querying data in the child documents
        • Querying data in the parent documents
      • The parent-child relationship and filtering
      • Performance considerations
    • Modifying your index structure with the update API
      • The mappings
      • Adding a new field
      • Modifying fields
    • Summary
  • Chapter 5: Make Your Search Better
    • An introduction to Apache Lucene scoring
      • When a document is matched
      • Default scoring formula
      • Relevancy matters
    • Scripting capabilities of Elasticsearch
      • Objects available during script execution
      • MVEL
      • Using other languages
      • Using our own script library
        • Using native code
    • Searching content in different languages
      • Handling languages differently
      • Handling multiple languages
      • Detecting the language of the documents
      • Sample document
      • The mappings
      • Querying
        • Queries with the identified language
        • Queries with unknown languages
        • Combining queries
    • Influencing scores with query boosts
      • The boost
      • Adding boost to queries
      • Modifying the score
        • The constant_score query
        • The boosting query
        • The function_score query
        • Deprecated queries
    • When does index-time boosting make sense?
      • Defining field boosting in input data
      • Defining boosting in mapping
    • Words with the same meaning
      • The synonym filter
        • Synonyms in the mappings
        • Synonyms stored in the filesystem
      • Defining synonym rules
        • Using Apache Solr synonyms
        • Using WordNet synonyms
      • Query- or index-time synonym expansion
    • Understanding the explain information
      • Understanding field analysis
      • Explaining the query
    • Summary
  • Chapter 6: Beyond Full-text Searching
    • Aggregations
      • General query structure
      • Available aggregations
        • Metric aggregations
        • Bucketing
      • Nesting aggregations
      • Bucket ordering and nested aggregations
      • Global and subsets
        • Inclusions and exclusions
    • Faceting
      • The document structure
      • Returned results
      • Using queries for faceting calculations
      • Using filters for faceting calculations
      • Terms faceting
      • Ranges based faceting
        • Choosing different fields for an aggregated data calculation
      • Numerical and date histogram faceting
        • The date_histogram facet
      • Computing numerical field statistical data
      • Computing statistical data for terms
      • Geographical faceting
      • Filtering faceting results
      • Memory considerations
    • Using suggesters
      • Available suggester types
      • Including suggestions
        • The suggester response
      • The term suggester
        • The term suggester configuration options
        • Additional term suggester options
      • The phrase suggester
      • The completion suggester
    • Percolator
      • The index
      • Percolator preparation
      • Getting deeper
        • Getting the number of matching queries
        • Indexed documents percolation
    • Handling files
      • Adding additional information about the file
    • Geo
      • Mappings preparation for spatial search
      • Example data
      • Sample queries
        • Distance-based sorting
        • Bounding box filtering
        • Limiting the distance
      • Arbitrary geo shapes
        • Point
        • Envelope
        • Polygon
        • Multipolygon
        • An example usage
        • Storing shapes in the index
    • The scroll API
      • Problem definition
      • Scrolling to the rescue
    • The terms filter
      • Terms lookup
        • The terms lookup query structure
        • Terms lookup cache settings
    • Summary
  • Chapter 7: Elasticsearch Cluster in Detail
    • Node discovery
      • Discovery types
      • The master node
        • Configuring the master and data nodes
        • The master-election configuration
      • Setting the cluster name
        • Configuring multicast
        • Configuring unicast
      • Ping settings for nodes
    • The gateway and recovery modules
      • The gateway
      • Recovery control
        • Additional gateway recovery options
    • Preparing Elasticsearch cluster for high query and indexing throughput
      • The filter cache
      • The field data cache and circuit breaker
        • The circuit breaker
      • The store
      • Index buffers and the refresh rate
        • The index refresh rate
      • The thread pool configuration
      • Combining it all together – some general advice
        • Choosing the right store
        • The index refresh rate
        • Tuning the thread pools
        • Tuning your merge process
        • The field data cache and breaking the circuit
        • RAM buffer for indexing
        • Tuning transaction logging
        • Things to keep in mind
    • Templates and dynamic templates
      • Templates
        • An example of a template
        • Storing templates in files
      • Dynamic templates
        • The matching pattern
        • Field definitions
    • Summary
  • Chapter 8: Administrating Your Cluster
    • The Elasticsearch time machine
      • Creating a snapshot repository
      • Creating snapshots
        • Additional parameters
      • Restoring a snapshot
      • Cleaning up – deleting old snapshots
    • Monitoring your cluster's state and health
      • The cluster health API
        • Controlling information details
        • Additional parameters
      • The indices stats API
        • Docs
        • Store
        • Indexing, get, and search
        • Additional information
      • The status API
      • The nodes info API
      • The nodes stats API
      • The cluster state API
      • The pending tasks API
      • The indices segments API
      • The cat API
        • Limiting returned information
    • Controlling cluster rebalancing
      • Rebalancing
      • Cluster being ready
      • The cluster rebalance settings
        • Controlling when rebalancing will start
        • Controlling the number of shards being moved between nodes concurrently
        • Controlling the number of shards initialized concurrently on a single node
        • Controlling the number of primary shards initialized concurrently on a single node
        • Controlling types of shards allocation
        • Controlling the number of concurrent streams on a single node
    • Controlling the shard and replica allocation
      • Explicitly controlling allocation
        • Specifying node parameters
        • Configuration
        • Index creation
        • Excluding nodes from allocation
        • Requiring node attributes
        • Using IP addresses for shard allocation
        • Disk-based shard allocation
      • Cluster wide allocation
      • Number of shards and replicas per node
      • Moving shards and replicas manually
        • Moving shards
        • Canceling shard allocation
        • Forcing shard allocation
        • Multiple commands per HTTP request
    • Warming up
      • Defining a new warming query
      • Retrieving the defined warming queries
      • Deleting a warming query
      • Disabling the warming up functionality
      • Choosing queries
    • Index aliasing and using it to simplify your everyday work
      • An alias
      • Creating an alias
      • Modifying aliases
      • Combining commands
      • Retrieving all aliases
      • Removing aliases
      • Filtering aliases
      • Aliases and routing
    • Elasticsearch plugins
      • The basics
      • Installing plugins
      • Removing plugins
    • The update settings API
    • Summary

Marek Rogoziński

Marek Rogoziński is a software architect and consultant with more than 10 years of experience. He has specialized in solutions based on open source search engines such as Solr and Elasticsearch, and also the software stack for Big Data analytics including Hadoop, HBase, and Twitter Storm.

He is also the cofounder of the solr.pl site, which publishes information and tutorials about Solr and the Lucene library. He is also the co-author of some books published by Packt Publishing.

Currently, he holds the position of the Chief Technology Officer in a new company, designing architecture for a set of products that collect, process, and analyze large streams of input data.


Rafał Kuć

Rafał Kuć is a born team leader and software developer. He currently works as a consultant and a software engineer at Sematext Group, Inc., where he concentrates on open source technologies such as Apache Lucene and Solr, Elasticsearch, and Hadoop stack. He has more than 12 years of experience in various branches of software, from banking software to e-commerce products. He focuses mainly on Java but is open to every tool and programming language that will make the achievement of his goal easier and faster. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people with the problems they face with Solr and Lucene. Also, he has been a speaker at various conferences around the world, such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, and Lucene Revolution.

Rafał began his journey with Lucene in 2002, and it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then, Solr came along and this was it. He started working with Elasticsearch in the middle of 2010. Currently, Lucene, Solr, Elasticsearch, and information retrieval are his main points of interest.

Rafał is also the author of Apache Solr 3.1 Cookbook, and the update to it, Apache Solr 4 Cookbook. Also, he is the author of the previous edition of this book and Mastering ElasticSearch. All these books have been published by Packt Publishing.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


Errata

- 4 submitted: last submission 23 Jul 2014

Errata type: Code | Page number: 23

The line: "curl -XGET http://localhost:9200/_cluster/nodes/"

should be:

"curl -XGET http://localhost:9200/_cluster/state/nodes/"

Errata type: Technical | Page number: 30

THE FOLLOWING PARAGRAPH:

"There is one more thing about document updates; if your script uses a field value from a document that is to be updated, you can set a value that will be used if the document doesn't have that value present. For example, if you want to increment the counter field of the document and it is not present, you can use the upsert section in your request to provide the default value that will be used. For example, look at the following lines of command:

curl -XPOST http://localhost:9200/blog/article/1/_update -d '{

"script": "ctx._source.counter += 1",

"upsert": {

"counter" : 0

}

}'

If you execute the preceding example, Elasticsearch will add the counter field with the value of 0to our example document. This is because our document does not have the counter field present and we've specified the upsertsection in the update request."

SHOULD BE:

"There is one more thing about the document updates; if your script uses a field value from a document that doesn’t exists, you can set a value that will be used as the default one. For example, if you like to increment the counter field of the document and the document is not present, you can use the upsert section in your request to provide the default value that is going to be used for that field. For example, look at the following lines of command:

curl -XPOST http://localhost:9200/blog/article/1/_update -d '{
 "script": "ctx._source.counter += 1",
 "upsert": {
   "counter" : 0
 }
}'

If you execute the preceding example and the document is not present, Elasticsearch will add the document with the counter field equal to 0.
"

Errata type: Code | Page numbers: 4, 50, 51, 61, 62, 64, 68, 85, 89

The line: "  "precision_step":"0"  "

should be:

"  "precision_step":"8"  "

Errata type: Technical | Page numbers: 2

"...you'll need a command that allows you to send HTTP requests such as cURL."

Should be:

"...you'll need a command line tool that allows you to send HTTP requests such as cURL."

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Elasticsearch Server: Second Edition +    Haskell Data Analysis Cookbook =
50% Off
the second eBook
Price for both: £27.35

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Configure and create your own index
  • Set up an analysis chain and handle multilingual data
  • Use the Elasticsearch query DSL to make all kinds of queries
  • Utilize filters efficiently and ensure they do not affect performance
  • Implement autocomplete functionality
  • Employ faceting, the aggregations framework, and similar functionalities to get more from your search and improve your clients' search experience
  • Monitor your cluster state and health by using Elasticsearch APIs as well as third-party monitoring solutions
  • Learn what gateway and discovery modules are, and how to properly configure them
  • Control primary shards and replica rebalancing

In Detail

This book begins by introducing the most commonly used Elasticsearch server functionalities, from creating your own index structure, through querying, faceting, and aggregations, and ends with cluster monitoring and problem diagnosis. As you progress through the book, you will cover topics such as starting Elasticsearch, creating a new index, and designing its proper structure. After that, you'll read about the query API that Elasticsearch exposes, as well as about filtering capabilities, aggregations, and faceting. Last but not least, you will get to know how to find similar documents by using similar functionalities and how to implement application alerts by using the prospective search functionality called percolator. Some advanced topics such as shard allocation control, gateway configuration, and how to use the discovery module will also be discussed. This book will also show you the possibilities of cluster state and health monitoring as well as how to use third-party tools.

Approach

This book is a detailed, practical, hands-on guide packed with real-life scenarios and examples which will show you how to implement an ElasticSearch search engine on your own websites.

Who this book is for

If you are a web developer or a user who wants to learn more about Elasticsearch, then this is the book for you. You do not need to know anything about Elasticsearch, Java, or Apache Lucene in order to use this book, though basic knowledge of databases and queries is required.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software