You're reading from Learning Elasticsearch

Product type Book

Published in Jun 2017

Publisher Packt

ISBN-13 9781787128453

Pages 404 pages

Edition 1st Edition

Languages

Concepts

Enterprise Search

Author (1):

Abhishek Andhavarapu

More Than a Search Engine (Geofilters, Autocomplete, and More)

In the previous chapter, we discussed different ways of querying Elasticsearch. For a given query, we looked into different ways of controlling the relevance score to get back the most relevant results at the top. In this chapter, we will discuss how to deal with typos, spelling mistakes, and auto-completing the query before the user finishes typing the query. We will talk about how to handle relationships and joins using nested and parent-child mappings and discuss the advantages and disadvantages of using one versus the other. We will also discuss how to include geolocation in your queries. Youâ€™ll learn about the percolate query, which is one of the very popular features of Elasticsearch. The main functionality of percolate query is reverse search; we will explore why it is important and the...

Sample data

To better explain the various concepts in this chapter, we will use the e-commerce site as an example. We will create an index with a list of products. We will create a simple index called chapter7 with type product. Our sample data looks like the following:

PUT chapter7/product/1
{ "product_name": "Apple iPhone 7"}

PUT chapter7/product/2
{ "product_name": "Apple iPhone Lightning Cable" }

PUT chapter7/product/3
{ "product_name": "Apple iPhone 6"}

PUT chapter7/product/4
{ "product_name": "Samsung Galaxy S7" }

PUT chapter7/product/5
{ "product_name": "Samsung Galaxy S6" }

As we progress through the chapter, we will recreate the chapter7 index with different configurations.

Correcting typos and spelling mistakes

In the previous chapter, we discussed different ways to query documents based on the user search input. But the search input might contain typos and spelling mistakes. Automatically correcting the user's spelling mistakes and typos improves the overall search experience. The term or match query that we discussed in the previous chapter only looks for the exact term in the inverted index. In this section, we will discuss different types of queries Elasticsearch provides to correct the typos.

Fuzzy query

The fuzzy query is provided to look for terms that are close to the original term. It looks for terms in the inverted index, which are like the query term based on the edit distance...

Making suggestions based on the user input

In the previous section, we discussed fuzzy query to fix the typos automatically. In this section, we will discuss the suggest API, which can provide word or phrase suggestions to the user based on the input query. Fuzzy query automatically corrects the fuzziness; Suggest API simply makes suggestions. Suggest API supports the following:

Term and phrase suggester: You can use the term or phrase suggester to make suggestions based on the existing documents in case of typos or spelling mistakes.
Completion suggester: You can use the completion suggester to predict the query term before the user finishes typing. Helping the user with the right search phrases improves the overall experience and decreases the load on the servers.

Implementing...

Highlighting

Elasticsearch supports highlighting the parts of the response that caused the match. In the following query, the matches in the product_name field are highlighted:

#Highlighting
POST chapter7/_search
 {
   "query": {
     "match": {
       "product_name": {
         "query": "samsung"
        
       }
     }
   },
   "highlight": {
         "fields" : {
             "product_name" : {}
         }
     }
 }

The response to the preceding query is as follows:

{
   ....
   "hits": {
     "total": 2,
     "max_score": 0.7590336,
     "hits": [
       {
         "_index": "chapter7",
         "_type": "product",
         "_id": "AVsC7GDlF21JdiUIl1Q-",
         "_score": 0.7590336,
         "_source"...

Handling document relations using parent-child

In Chapter 3, Modeling Your Data and Document Relations, we described how to set the mapping and index parent-child documents. In this section, we will discuss how to query parent-child documents. To manage relationships in Elasticsearch, parent-child and nested mappings are provided. The difference between parent-child and nested is in how the documents are stored. The parent-child documents are costly while querying for the data, and nested documents are costly while indexing data. We discussed the differences in detail in Chapter 3, Modeling your data and Document Relations.

In the previous sections, we indexed product documents for the e-commerce store. In this section, we will use the parent-child relation to index the reviews for the products as the child documents. The product document is the parent document. A new review can...

Handling document relations using nested

In the Document Relations section in Chapter 3, Modeling Your Data and Document Relations, we described how to store nested documents. In this section, we will discuss how to query them. To better explain querying nested documents, we will change the mappings of the chapter7 index to store the variations of a product as nested documents. For example, an iPhone is available in several storage options, such as 32GB, 64GB, and so on. Each variation has a different price. Each variation of a product will be stored as a nested document. A product can have one or more variations. In the following query, we will change the mapping of the type product to include the variations as nested documents:

#Delete existing index
DELETE chapter7
 
#Set mappings
PUT chapter7
 {
   "settings": {},
   "mappings": {
     "product": {
  ...

Scripting

Scripting is one of the most powerful features of Elasticsearch. In this chapter so far, we discussed different types of queries Elasticsearch supports. If these queries are not enough, Elasticsearch also provides script query. Scripting allows you to run user defined scripts to determine whether a document should be filtered or not. Along with the script query, fields based on a script, sorting based on a script are also supported. In Elasticsearch 5.0, Painless, a new scripting language, which is both secure and fast, is introduced. Along with Painless, special-purpose languages, such as Expression, Mustache, and Java are also supported.

Script Query

Script query can be used to evaluate documents against a user...

Post Filter

You can tell Elasticsearch to run an expensive query, such as a script or geolocation, using post filter. The query in the post filter is only executed after the main query is executed so that the number of documents the expensive query has to be executed on is minimum. In the following query, we will run the script query as post filter:

POST chapter7/product/_search
 {
   "query": {
     "match": {
       "product_name": "iphone"
     }
   },
   "post_filter": {
     "script": {
       "script": {
         "lang": "painless",
         "inline": "params._source.containsKey('variations') && params._source.variations.length > params.num_of_variations",
         "params": {
           "num_of_variations": 1
         }
       }
     }
 ...

Reverse search using the percolate query

The percolate query is one of the popular features of Elasticsearch. Until now, we indexed the documents and used the search API to query the documents. Percolate query is reverse search. The actual queries are stored into an index, and we can percolate a document to get the queries that match the document. By using the percolate query, you are checking whether the document matches any of the predefined criteria. Common use cases include alerts and monitoring.
For example, we want to classify the products in an e-commerce store. First, we will add predefined queries to the chapter7 index and use the percolate query to check whether a product matches any predefined queries. The following example will make it more clear. To use the percolate query, we need to first add a mapping with the percolator type. In the following command, we are adding...

Geo and Spatial Filtering

In modern applications, spatial or location-based filtering is a very common requirement, not just for filtering the results based on a location, but also as one of the driving factors of relevance. The results that are closer to the user location appear at the top of the list, the results that are not close are not removed from the list but simply placed at the bottom of the list. Elasticsearch makes it very easy to work with geographical data by combining full-text search and location-based filtering. Sorting the results based on the distance from the current user location is also supported.

To use geolocation queries, the location information should be indexed using a special mapping type. The geolocation can be stored using the geo_point mapping type if you want to store the location data in the form of latitude/longitude pairs. If you want to store...

Multi search

Multi search allows us to group search requests together. It is similar to the multi-get and the other bulk APIs. By grouping the requests together, we can save the network round trips and execute the queries in the request in parallel. We can control the number of the requests that are executed in parallel. A simple multi search request is shown here:

#Multi Search
GET chapter7/_msearch
{"type" : "product"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"type" : "product_review"}
{"query" : {"match_all" : {}}}

The response of the multi search query is a list of responses. Each request is executed independently, and the failure of one request will not affect the others. The response to the preceding query will contain responses to two queries as shown next:

{
...

Search templates

Search templates are very similar to stored procedures in the relational database. Commonly used queries can be defined as a template, and the applications using Elasticsearch can simply refer to the query by its ID. The template accepts parameters, which can be specified at the runtime. Search templates are stored on the server side and can be modified without changes to the client code. Templates are expressed using the Mustache template engine. For more information on mustache, please visit http://mustache.github.io/mustache.5.html.

Let's start by defining a template query to find all the products by their name. The query is as follows:

#Define Template
 POST _search/template/find_product_by_name
 {
   "query" : {
     "match" : {
       "product_name": "{{ product_name }}"
     }
   }
 }

Once the template is defined, you...

Querying Elasticsearch from Java application

In this section, we will discuss how to use a Java client to query Elasticsearch. In Chapter 4, Indexing and Updating Your Data, we discussed different types of Java clients and how to use them in your application for indexing. In this section, we will discuss how to use the client to query Elasticsearch. Let's take a simple match query as shown here:

#Match Query
 POST chapter7/_search
 {
   "query": {
     "match": {
       "product_name": "iphone"
     }
   }
 }

All the queries available via the REST API are also made available via the transport client. To execute the query using the Java client, first set up the client as shown next:

TransportAddress node1 = new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300);

Settings setting = Settings.builder().put("cluster...

Summary

In this chapter, we discussed how to implement autocomplete, highlighting, and correcting user typos. Elasticsearch doesn't support traditional SQL joins, and you learned how to use parent-child and nested mapping to handle relationships between different document types. We discussed filtering based on geolocation and how to use location as one of factors driving the relevance score. We also discussed using Painless scripting language to query based on user-defined scripts. We also covered Search Templates and how to query Elasticsearch from your application.

In the next chapter, we will discuss aggregations and how to use them to slice and dice your data.