You're reading from Learning Elasticsearch

Product type Book

Published in Jun 2017

Publisher Packt

ISBN-13 9781787128453

Pages 404 pages

Edition 1st Edition

Languages

Concepts

Enterprise Search

Author (1):

Abhishek Andhavarapu

Modeling Your Data and Document Relations

In the previous chapter, we learned how to set up and configure Elasticsearch. Once the cluster is up and running, you can start indexing your data. Elasticsearch will automatically figure out the schema of the documents when you index, which works great for getting started. But for all practical purposes, you have to tell Elasticsearch the schema of your data to avoid any surprises. Modeling your data is one of the most important steps before using Elasticsearch. In this chapter, youâ€™ll learn how to model your data and handle relations between different document types. In this chapter, we will cover the following:

Configure mappings
Dynamic mapping
Core data types
Complex data types
Geo location
Modeling relations in Elasticsearch

Mapping

Mapping is the process of defining the schema or the structure of the documents. It describes the properties of the fields in the document. The properties of the field include the data type (for example, string, integer, and so on) and the metadata. In the previous chapters, we discussed how the documents are converted into the inverted index when we index them. During indexing, the mappings of the fields define how the fields are indexed and stored in the inverted index.

Just like you would define table schema in SQL, it is important to set the mappings of the index before you index any data. As we discussed before, a type in Elasticsearch is like an SQL table, which groups documents of similar nature (you would define one type for users, one for orders). Each type has its mapping defined. Having different mappings could also be a motivation to define a new type.

Apache...

Difference between full-text search and exact match

In this section, we will describe analyzers and why they are necessary for text search. Let's say we have a document containing the following information:

{
   "date": "2017/02/01",
   "desc": "It will be raining in yosemite this weekend"
 }

If we want to search for the documents that contain word yosemite, we could run an SQL query as shown here:

select * from news where desc like '%yosemite%'

This functionality is very limited and is never sufficient for real-world text-search queries. For example, if a user is looking for the weather forecast in Yosemite, he/she would query for the same in human language using something such as rain in yosemite. Since SQL can only match the exact words, and the document doesn't contain the word rain, the query will not come back with...

Core data types

In this section, we will discuss the core data types supported by Elasticsearch. You can set the mapping using the Mapping API.

Text

Starting Elasticsearch 5.0, the string data type is deprecated and replaced by the text and keyword data types. If you want to perform a full-text search as we discussed in the previous section, you should use text data type. If you only want an exact match, you should use keyword data type. We will discuss keyword data type in the next section.

Let's take the same example we used in Chapter 1, Introduction to Elasticsearch. We have a document containing the following fields:

{
   "date": "2017-01-01",
   "description": "Yosemite national...

Complex data types

In the previous section, we talked about simple data types. In this section, we will talk about how to set mapping for arrays, objects, and nested objects.

Array

There is no special data type for an array. A field can contain one or more fields of the same data type. Let's look at an example where we have two documents, as shown next:

Document 1:

{ "keyword_field" : "keyword1" }

Document 2:

{
   "keyword_field" : ["keyword2", "keyword3"]
}

The mapping for keyword_field is defined as shown next:

{
   "properties": {
     "keyword_field": {
       "type": "keyword"
     }
   }
 }

No special handling is required for arrays...

Geo data type

In the previous sections, we discussed the simple and complex data types Elasticsearch supports. In this section, how to store location-based data. Elasticsearch makes it very easy to work with location-based queries, such as querying within a radius, aggregations based on location, sorting by location, and so on. With the rapid growth of mobile, location is one of the key factors driving the search results. To run location-based queries, you have to set the field data type to geo.

Elasticsearch supports two data types to store location-based data:

geo-point: This is used to store the longitude and latitude of a location.
geo-shape: This is used to store geo shapes, such as circles and polygons.

In this section, we will only discuss how to set the mapping for the geo-point data type. The geo-shape data type is for storing geo shapes. To know more about geo-shape...

Specialized data type

Elasticsearch supports the following specialized data types:

IP: This is used to store IP address
Completion: This is used to support the auto-complete feature
Percolator: This is used to support reverse search

We will discuss IP data type in the next section. Completion and percolator are best explained with examples, and we will discuss them in detail in Chapter 7, More than a search engine.

IP

In the previous section, we discussed geo data type, which is used to store location-based data. In this section, we will discuss IP data type, which is used to store IP addresses. Both IPv4 and IPv6 addresses are supported. For example, we have a login history document, and we want to store the IP address...

Mapping the same field with different mappings

Sometimes you want to index the same field with different mappings. For example, you want to index the title field both as text and as keyword. You can use the keyword field for an exact match and the text field for text search. You can do this by defining two fields, one with keyword mapping and other with text mapping, as shown next:

{
   "properties": {
     "title_text": {
       "type": "text"
     },
     "title_keyword": {
       "type": "keyword"
     }
   }
 }

You can index the document as follows:

{
 "title_text"    : "Learning Elasticsearch",
 "title_keyword" : "Learning Elasticsearch"
}

While indexing, the same value is used for both the title_text and title_keyword fields. The document source will now have two fields with...

Handling relations between different document types

In the relational world, data is often divided into multiple tables and is linked using foreign keys. To get the data, a join is used to combine data from one or more tables. But in the NoSQL world, data is usually denormalized and stored as one big document. However, it is often advantageous to store these documents separately. Data in Elasticsearch is immutable. An update to an existing document means fetching the old document, applying the change, and re-indexing it as a new document. The update is an expensive operation. If possible, we have to keep the updates to a minimum.

For example, a blog article can have one or more comments, and an order can have one or more line items. If we can separate the article and comment documents, we don't have to update the article when there is a new comment. Elasticsearch provides...

Routing

We discussed before that an index contains one or more shards. During indexing, the document ID is used to determine which shard the document belongs to, using a simple formula as follows:

hash(document_id) % no_of_shards

To retrieve a document using the document ID, the same formula is used to determine the shard the document belongs to, and the document is retrieved:

When executing a search query, the node that receives the request is known as the coordinating node. The coordinating node (Node2) sends the query to all the shards of the index, aggregates the results, and sends them back to the client.

By default, a query has to be executed on all the shards of the index. But if you have a way to group similar data together, routing can be used to send the requests to a single shard instead of all the shards in the index.

For example, you want to use Elasticsearch...

Summary

In this chapter, you learned about various simple and complex data types Elasticsearch supports. You also learned how to handle unstructured data using dynamic mappings. We discussed how full-text search works and the difference between exact match and full-text search. We discussed how to manage document relations. We also covered routing and how it works.

In the next chapter, we will discuss how to index and update your data.