Reader small image

You're reading from  Elasticsearch 5.x Cookbook - Third Edition

Product typeBook
Published inFeb 2017
Publisher
ISBN-139781786465580
Edition3rd Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 3. Managing Mappings

In this chapter, will cover the following recipes:

  • Using explicit mapping creation

  • Mapping base types

  • Mapping arrays

  • Mapping an object

  • Mapping a document

  • Using dynamic templates in document mapping

  • Managing nested objects

  • Managing a child document

  • Adding a field with multiple mappings

  • Mapping a GeoPoint field

  • Mapping a GeoShape field

  • Mapping an IP field

  • Mapping an attachment field

  • Adding metadata to a mapping

  • Specifying different analyzers

  • Mapping a completion field

Introduction


Mapping is a very important concept in Elasticsearch, as it defines how the search engine should process a document.

Search engines perform two main operations:

  • Indexing: This is the action to receive a document and store/index/process in an index

  • Searching: This is the action to retrieve the data from the index

These two parts are strictly connected. An error in the indexing step leads to unwanted or missing search results.

Elasticsearch has explicit mapping on an index/type level. When indexing, if a mapping is not provided, a default one is created, guessing the structure from the data fields that compose the document; then, this new mapping is automatically propagated to all cluster nodes.

The default type mapping has sensible default values, but when you want to change their behavior or you want to customize several other aspects of indexing (storing, ignoring, completion, and so on), you need to provide a new mapping definition.

In this chapter, we'll see all the possible types...

Using explicit mapping creation


If we consider the index as a database in the SQL world, the mapping is similar to the table definition.

Elasticsearch is able to understand the structure of the document that you are indexing (reflection) and create the mapping definition automatically (explicit mapping creation).

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via command-line, you need to install curl for your operative system.

To better understand examples and code in this recipe, basic knowledge of JSON is required.

How to do it...

You can explicitly create a mapping by adding a new document in Elasticsearch. We will perform the following steps:

  1. Create an index:

            curl -XPUT http://127.0.0.1:9200/test

    The answer will be as follows:

             {acknowledged":true} 
    
  2. Put a document in the index:

            curl -XPUT http://127.0.0.1:9200/test/mytype/1...

Mapping base types


Using explicit mapping allows to be faster in starting to insert the data using a schema less approach without being concerned of the field types, so as to achieve better results and performance in indexing, it's required to manually define a mapping.

Fine-tuning mapping brings some advantages such as:

  • Reducing the index size on the disk (disabling functionalities for custom fields)

  • Indexing only interesting fields (general speed up)

  • Precooking data for fast search or real-time analytics (such as facets)

  • Correctly defining whether a field must be analyzed in multiple tokens or considered as a single token

Elasticsearch allows using base fields with a wide range of configurations.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To execute this recipe's examples, you...

Mapping arrays


An array or multivalue fields are very common in data models (such as multiple phone numbers, addresses, names, alias, and so on), but not natively supported in traditional SQL solutions.

In SQL, multivalue fields require the creation of accessory tables that must be joined to gather all the values, leading to poor performance when the cardinality of records is huge.

Elasticsearch, which works natively in JSON, provides support for multivalue fields transparently.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

Every field is automatically managed as an array. For example, to store tags for a document, the mapping will be:

{ 
    "document" : { 
        "properties" : { 
   "name" : {"type" : "keyword"}, 
   "tag" : {"type" : "keyword...

Mapping an object


The object is the base structure (analogous to a record in SQL). Elasticsearch extends the traditional use of objects allowing recursive embedded objects.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can rewrite the mapping of order type form of the mapping the base types recipe using an array of items:

{ 
    "order" : { 
    "properties" : { 
    "id" : {"type" : "keyword"}, 
    "date" : {"type" : "date"}, 
    "customer_id" : {"type" : "keyword", "store" : "yes"}, 
    "sent" : {"type" : "boolean"}, 

    "item" : {
    "type" : "object",
    "properties" : {
    "name" : {"type" : "text"},
    "quantity" : {"type" : "integer"},
    "vat" : {"type" : "double"}
 ...

Mapping a document


The document is also referred as the root object. It has special parameters to control its behavior, mainly used internally to do special processing, such as routing or time-to-live of documents.

In this recipe, we'll take a look at these special fields and learn how to use them.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can extend the preceding order example adding some of the special fields, for example:

{ 
  "order": { 
  "_id": {
  "index": true
  },
  "_type": {
  "store": "yes"
  },
  "_source": {
  "store": "yes"
  },
  "_all": {
  "enable": false
  },
  "_routing": {
  "required": true
  },
  "_index": {
  "enabled": true
  },
  "_size...

Using dynamic templates in document mapping


In the Using explicit mapping creation recipe, we have seen how Elasticsearch is able to guess the field type using reflection. In this recipe, we'll see how to help it to improve its guessing capabilities via dynamic templates.

Dynamic template feature is very useful, for example, if you need to create several indices, with similar types, because it allows moving the need to define mappings from coded initial routines to automatic index-document creation. A typical usage is to define types for Logstash log indices.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can extend the previous mapping adding document-related settings:

 { 
    "order" : { 
    "dynamic_date_formats":["yyyy-MM-dd", "dd-MM-yyyy"],
...

Managing nested objects


There is a special type of embedded object, the nested one. This resolves a problem related to Lucene indexing architecture, in which all the fields of embedded objects are viewed as a single object. During search, in Lucene, it is not possible to distinguish values between different embedded objects in the same multivalued array.

If we consider the previous order example, it's not possible to distinguish an item name and its quantity with the same query, as Lucene puts them in the same Lucene document object. We need to index them in different documents and then join them. This "entire trip" is managed by nested objects and nested queries.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

A nested object is defined as the standard object with the type nested.

From the example in the Mapping an object recipe, we can change the type...

Managing child document


In the previous recipe, we have seen how it's possible to manage relations between objects with the nested object type. The disadvantage of nested objects is their dependence from their parent. If you need to change a value of a nested object, you need to reindex the parent (this brings a potential performance overhead if the nested objects change too quickly). To solve this problem, Elasticsearch allows defining child documents.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

We can modify the mapping of the order example indexing the items as separated child documents.

We need to extract the item object and create a new type document item with the _parent property set.

{ 
    "order": { 
        "properties": { 
            "id": { 
                "type": "keyword", 
                "store": "yes" 
...

Adding a field with multiple mapping


Often a field must be processed with several core types or in different ways. For example, a string field must be processed tokenized for search and not-tokenized for sorting. To do this, we need to define a multifield special property fields.

The fields property is a very powerful feature of mappings because it allows you to use the same field in different ways.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

To define a multifield property, we need to define a dictionary containing the subfields called fields. The subfield with the same name of parent field is the default one.

If we consider the item of our order example, we can index the name in this way:

       "name": { 
                "type": "keyword", 
                "fields": { 
                "name": {                        
       ...

Mapping a GeoPoint field


Elasticsearch natively supports the use of geolocation types: special types that allow localizing your document in geographic coordinates (latitude and longitude) around the world.

There are two main types used in geographic world: the point and the shape. In this recipe, we'll see geo point--the base element of geo location.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

The type of the field must be set to geo_point to define a GeoPoint.

We can extend the order example adding a new field that stores the location of a customer. This will be the result:

{ 
    "order": { 
        "properties": { 
            "id": { 
                "type": "keyword", 
                "store": "yes" 
            }, 
            "date": { 
                "type": "date" 
            }, 
        ...

Mapping a GeoShape field


An extension to the concept of point is the shape. Elasticsearch provides a type that facilitates the management of arbitrary polygons: the GeoShape.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To be able to use advanced shape management, Elasticsearch requires two JAR libraries in its classpath (usually the lib directory):

  • Spatial4J (v0.3)

  • JTS (v1.13)

How to do it

To map a geo_shape type, a user must explicitly provide some parameters:

  • tree: This is the name of the PrefixTree implementation: geohash for GeohashPrefixTree and quadtree for QuadPrefixTree (default geohash)

  • precision: This is used instead of tree_levels to provide a more human value to be used in the tree level. The precision number can be followed by the unit, that is, 10m, 10km, 10miles, and so on

  • tree_levels: This is the maximum number of layers to be used in the prefix tree

  • distance_error_pct...

Mapping an IP field


Elasticsearch is used in a lot of systems to collect and search logs such as Kibana (https://www.elastic.co/products/kibana) and LogStash (https://www.elastic.co/products/logstash). To improve search in these scenarios, it provides the IPv4 and IPv6 type that can be used to store IP address in an optimized way.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

You need to define the type of the field that contains IP address as ip  as follows:

Using the preceding order example, we can extend it adding the customer IP with:

 "customer_ip": { 
 "type": "ip", 
 "store": "yes" 
   } 

The IP must be in the standard point notation form, that is:

"customer_ip":"19.18.200.201" 

How it works...

When Elasticsearch is processing a document, if a field is an IP one, it tries to convert its value to a numerical form and generate...

Mapping an attachment field


Elasticsearch allows extending its core types to cover new requirements with native plugins that provide new mapping types. A most used custom field type is the attachment one.

It allows indexing and searching the contents of common documental files, that is, Microsoft office formats, open document formats, PDF, ePub, and many others.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup with the ingest attachment plugin installed.

It can be installed from the command line with the following command:

 bin/elasticsearch-plugin install ingest-attachment

How to do it...

To map a field as attachment, it's required to set the type to attachment.

Internally, the attachment field defines the fields property as a multi-field that takes some binary data (encoded base64) and extracts several useful information such as author, content, title, date, and so on...

Adding metadata to a mapping


Sometimes when we are working with our mapping, it is required to store some additional data to be used for display purpose, ORM facilities, permissions, or simply to track them in the mapping.

Elasticsearch allows storing every kind of JSON data we want in the mapping with the special field _meta.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

  1. The _meta mapping field can be populated with any data we want. Consider the following example:

            { 
                 "order": { 
                     "_meta": { 
                        "attr1": ["value1", "value2"], 
                        "attr2": { 
                             "attr3": "value3" 
                        } 
                    } 
                } 
             } 
    

How it works...

When Elasticsearch processes a new mapping and finds a _meta field...

Specifying a different analyzer


In the previous recipes, we have seen how to map different fields and objects in Elasticsearch and we have described how it's easy to change the standard analyzer with the analyzer and search_analyzer properties.

In this recipe, we will see several analyzers and how to use them to improve the indexing and searching quality.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

Every core type field allows you to specify custom analyzer for indexing and for searching as field parameters.

For example, if we want that the name field uses a standard analyzer for indexing and a simple analyzer for searching, the mapping will be as follows:

{ 
    "name": { 
        "type": "string", 
        "index_analyzer": "standard", 
        "search_analyzer": "simple" 
    } 
       } 

How it works...

The concept...

Mapping a completion field


For providing search functionalities for our user, one of the most common requirements is to provide text suggestion for our query.

Elasticsearch provides a helper for archiving this functionality via a special type mapping called completion.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

The definition of a completion field is similar to the previous core type fields. For example, to provide suggestion for a name with alias, we can write a similar mapping:

{ 
    "name": {"type": "string", "copy_to":["suggest"]}, 
    "alias": {"type": "string", "copy_to":["suggest"]}, 
    "suggest": { 
        "type": "completion", 
        "payloads": true, 
        "analyzer": "simple", 
        "search_analyzer": "simple" 
    } 
} 

In this example, we have defined two string fields name...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 5.x Cookbook - Third Edition
Published in: Feb 2017Publisher: ISBN-13: 9781786465580
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro