Packt+ | Advance your knowledge in tech

You're reading from Elasticsearch 5.x Cookbook - Third Edition

Product typeBook

Published inFeb 2017

Publisher

ISBN-139781786465580

Edition3rd Edition

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Alberto Paro

Chapter 3. Managing Mappings

In this chapter, will cover the following recipes:

Using explicit mapping creation
Mapping base types
Mapping arrays
Mapping an object
Mapping a document
Using dynamic templates in document mapping
Managing nested objects
Managing a child document
Adding a field with multiple mappings
Mapping a GeoPoint field
Mapping a GeoShape field
Mapping an IP field
Mapping an attachment field
Adding metadata to a mapping
Specifying different analyzers
Mapping a completion field

Introduction

Mapping is a very important concept in Elasticsearch, as it defines how the search engine should process a document.

Search engines perform two main operations:

Indexing: This is the action to receive a document and store/index/process in an index
Searching: This is the action to retrieve the data from the index

These two parts are strictly connected. An error in the indexing step leads to unwanted or missing search results.

Elasticsearch has explicit mapping on an index/type level. When indexing, if a mapping is not provided, a default one is created, guessing the structure from the data fields that compose the document; then, this new mapping is automatically propagated to all cluster nodes.

The default type mapping has sensible default values, but when you want to change their behavior or you want to customize several other aspects of indexing (storing, ignoring, completion, and so on), you need to provide a new mapping definition.

In this chapter, we'll see all the possible types...

Using explicit mapping creation

If we consider the index as a database in the SQL world, the mapping is similar to the table definition.

Elasticsearch is able to understand the structure of the document that you are indexing (reflection) and create the mapping definition automatically (explicit mapping creation).

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via command-line, you need to install curl for your operative system.

To better understand examples and code in this recipe, basic knowledge of JSON is required.

How to do it...

You can explicitly create a mapping by adding a new document in Elasticsearch. We will perform the following steps:

Create an index:

        curl -XPUT http://127.0.0.1:9200/test

The answer will be as follows:

         {acknowledged":true}

Put a document in the index:

        curl -XPUT http://127.0.0.1:9200/test/mytype/1...

Mapping base types

Using explicit mapping allows to be faster in starting to insert the data using a schema less approach without being concerned of the field types, so as to achieve better results and performance in indexing, it's required to manually define a mapping.

Fine-tuning mapping brings some advantages such as:

Reducing the index size on the disk (disabling functionalities for custom fields)
Indexing only interesting fields (general speed up)
Precooking data for fast search or real-time analytics (such as facets)
Correctly defining whether a field must be analyzed in multiple tokens or considered as a single token

Elasticsearch allows using base fields with a wide range of configurations.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To execute this recipe's examples, you...

Mapping arrays

An array or multivalue fields are very common in data models (such as multiple phone numbers, addresses, names, alias, and so on), but not natively supported in traditional SQL solutions.

In SQL, multivalue fields require the creation of accessory tables that must be joined to gather all the values, leading to poor performance when the cardinality of records is huge.

Elasticsearch, which works natively in JSON, provides support for multivalue fields transparently.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

Every field is automatically managed as an array. For example, to store tags for a document, the mapping will be:

{ 
    "document" : { 
        "properties" : { 
   "name" : {"type" : "keyword"}, 
   "tag" : {"type" : "keyword...

Mapping an object

The object is the base structure (analogous to a record in SQL). Elasticsearch extends the traditional use of objects allowing recursive embedded objects.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can rewrite the mapping of order type form of the mapping the base types recipe using an array of items:

{ 
    "order" : { 
    "properties" : { 
    "id" : {"type" : "keyword"}, 
    "date" : {"type" : "date"}, 
    "customer_id" : {"type" : "keyword", "store" : "yes"}, 
    "sent" : {"type" : "boolean"}, 

    "item" : {
    "type" : "object",
    "properties" : {
    "name" : {"type" : "text"},
    "quantity" : {"type" : "integer"},
    "vat" : {"type" : "double"}
 ...

Mapping a document

The document is also referred as the root object. It has special parameters to control its behavior, mainly used internally to do special processing, such as routing or time-to-live of documents.

In this recipe, we'll take a look at these special fields and learn how to use them.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can extend the preceding order example adding some of the special fields, for example:

{ 
  "order": { 
  "_id": {
  "index": true
  },
  "_type": {
  "store": "yes"
  },
  "_source": {
  "store": "yes"
  },
  "_all": {
  "enable": false
  },
  "_routing": {
  "required": true
  },
  "_index": {
  "enabled": true
  },
  "_size...

Using dynamic templates in document mapping

In the Using explicit mapping creation recipe, we have seen how Elasticsearch is able to guess the field type using reflection. In this recipe, we'll see how to help it to improve its guessing capabilities via dynamic templates.

Dynamic template feature is very useful, for example, if you need to create several indices, with similar types, because it allows moving the need to define mappings from coded initial routines to automatic index-document creation. A typical usage is to define types for Logstash log indices.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

We can extend the previous mapping adding document-related settings:

 { 
    "order" : { 
    "dynamic_date_formats":["yyyy-MM-dd", "dd-MM-yyyy"],...

Managing nested objects

There is a special type of embedded object, the nested one. This resolves a problem related to Lucene indexing architecture, in which all the fields of embedded objects are viewed as a single object. During search, in Lucene, it is not possible to distinguish values between different embedded objects in the same multivalued array.

If we consider the previous order example, it's not possible to distinguish an item name and its quantity with the same query, as Lucene puts them in the same Lucene document object. We need to index them in different documents and then join them. This "entire trip" is managed by nested objects and nested queries.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

A nested object is defined as the standard object with the type nested.

From the example in the Mapping an object recipe, we can change the type...

Managing child document

In the previous recipe, we have seen how it's possible to manage relations between objects with the nested object type. The disadvantage of nested objects is their dependence from their parent. If you need to change a value of a nested object, you need to reindex the parent (this brings a potential performance overhead if the nested objects change too quickly). To solve this problem, Elasticsearch allows defining child documents.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

We can modify the mapping of the order example indexing the items as separated child documents.

We need to extract the item object and create a new type document item with the _parent property set.

{ 
    "order": { 
        "properties": { 
            "id": { 
                "type": "keyword", 
                "store": "yes" ...

Adding a field with multiple mapping

Often a field must be processed with several core types or in different ways. For example, a string field must be processed tokenized for search and not-tokenized for sorting. To do this, we need to define a multifield special property fields.

The fields property is a very powerful feature of mappings because it allows you to use the same field in different ways.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

To define a multifield property, we need to define a dictionary containing the subfields called fields. The subfield with the same name of parent field is the default one.

If we consider the item of our order example, we can index the name in this way:

       "name": { 
                "type": "keyword", 
                "fields": { 
                "name": {                        
       ...

Mapping a GeoPoint field

Elasticsearch natively supports the use of geolocation types: special types that allow localizing your document in geographic coordinates (latitude and longitude) around the world.

There are two main types used in geographic world: the point and the shape. In this recipe, we'll see geo point--the base element of geo location.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

The type of the field must be set to geo_point to define a GeoPoint.

We can extend the order example adding a new field that stores the location of a customer. This will be the result:

{ 
    "order": { 
        "properties": { 
            "id": { 
                "type": "keyword", 
                "store": "yes" 
            }, 
            "date": { 
                "type": "date" 
            }, 
        ...

Mapping a GeoShape field

An extension to the concept of point is the shape. Elasticsearch provides a type that facilitates the management of arbitrary polygons: the GeoShape.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To be able to use advanced shape management, Elasticsearch requires two JAR libraries in its classpath (usually the lib directory):

Spatial4J (v0.3)
JTS (v1.13)

How to do it

To map a geo_shape type, a user must explicitly provide some parameters:

tree: This is the name of the PrefixTree implementation: geohash for GeohashPrefixTree and quadtree for QuadPrefixTree (default geohash)
precision: This is used instead of tree_levels to provide a more human value to be used in the tree level. The precision number can be followed by the unit, that is, 10m, 10km, 10miles, and so on
tree_levels: This is the maximum number of layers to be used in the prefix tree
distance_error_pct...

Mapping an IP field

Elasticsearch is used in a lot of systems to collect and search logs such as Kibana (https://www.elastic.co/products/kibana) and LogStash (https://www.elastic.co/products/logstash). To improve search in these scenarios, it provides the IPv4 and IPv6 type that can be used to store IP address in an optimized way.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

You need to define the type of the field that contains IP address as ip as follows:

Using the preceding order example, we can extend it adding the customer IP with:

 "customer_ip": { 
 "type": "ip", 
 "store": "yes" 
   }

The IP must be in the standard point notation form, that is:

"customer_ip":"19.18.200.201"

How it works...

When Elasticsearch is processing a document, if a field is an IP one, it tries to convert its value to a numerical form and generate...

Mapping an attachment field

Elasticsearch allows extending its core types to cover new requirements with native plugins that provide new mapping types. A most used custom field type is the attachment one.

It allows indexing and searching the contents of common documental files, that is, Microsoft office formats, open document formats, PDF, ePub, and many others.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup with the ingest attachment plugin installed.

It can be installed from the command line with the following command:

 bin/elasticsearch-plugin install ingest-attachment

How to do it...

To map a field as attachment, it's required to set the type to attachment.

Internally, the attachment field defines the fields property as a multi-field that takes some binary data (encoded base64) and extracts several useful information such as author, content, title, date, and so on...

Adding metadata to a mapping

Sometimes when we are working with our mapping, it is required to store some additional data to be used for display purpose, ORM facilities, permissions, or simply to track them in the mapping.

Elasticsearch allows storing every kind of JSON data we want in the mapping with the special field _meta.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

The _meta mapping field can be populated with any data we want. Consider the following example:

        { 
             "order": { 
                 "_meta": { 
                    "attr1": ["value1", "value2"], 
                    "attr2": { 
                         "attr3": "value3" 
                    } 
                } 
            } 
         }

How it works...

When Elasticsearch processes a new mapping and finds a _meta field...

Specifying a different analyzer

In the previous recipes, we have seen how to map different fields and objects in Elasticsearch and we have described how it's easy to change the standard analyzer with the analyzer and search_analyzer properties.

In this recipe, we will see several analyzers and how to use them to improve the indexing and searching quality.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

Every core type field allows you to specify custom analyzer for indexing and for searching as field parameters.

For example, if we want that the name field uses a standard analyzer for indexing and a simple analyzer for searching, the mapping will be as follows:

{ 
    "name": { 
        "type": "string", 
        "index_analyzer": "standard", 
        "search_analyzer": "simple" 
    } 
       }

How it works...

The concept...

Mapping a completion field

For providing search functionalities for our user, one of the most common requirements is to provide text suggestion for our query.

Elasticsearch provides a helper for archiving this functionality via a special type mapping called completion.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

The definition of a completion field is similar to the previous core type fields. For example, to provide suggestion for a name with alias, we can write a similar mapping:

{ 
    "name": {"type": "string", "copy_to":["suggest"]}, 
    "alias": {"type": "string", "copy_to":["suggest"]}, 
    "suggest": { 
        "type": "completion", 
        "payloads": true, 
        "analyzer": "simple", 
        "search_analyzer": "simple" 
    } 
}

In this example, we have defined two string fields name...

The rest of the chapter is locked

You have been reading a chapter from

Elasticsearch 5.x Cookbook - Third Edition

Published in: Feb 2017Publisher: ISBN-13: 9781786465580

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Other recommended products

Related to this chapter

Elasticsearch 7.0 Cookbook

This book is your one-stop guide to master Elasticsearch. It provides numerous problem-solution based recipes through which you can implement Elasticsearch in your enterprise applications in a very simple, hassle-free way.

BookApr 2019724 pages

Mastering Elasticsearch 5.x

This book will help you leverage Elasticsearch, guiding you through everything from writing and creating customized plugins to extend Elasticsearch to tackling challenges while handling relational data in Elasticsearch. You’ll learn with the help of practical examples in a step-by-step way.

BookFeb 2017428 pages

Elasticsearch 7 Quick Start Guide

Elasticsearch is one of the most popular tools for distributed search. This book will help you in understanding all about the new features of Elasticsearch 7, and how to use them efficiently for searching, aggregating and indexing data with speed and accuracy.

BookOct 2019186 pages

Learning Elasticsearch

Elasticsearch is a Lucene-based search and analytics engine for distributed search and analytics. This book will be your hands-on guide as you explore and put to use the features of Elasticsearch 5.x.

BookJun 2017404 pages

Advanced Elasticsearch 7.0

Advanced Elasticsearch 7.0, will help the readers to leverage new features and Core APIs of Elasticsearch to perform advanced search operations. This book covers data modeling, aggregations, pipeline processing, and data Analytics using Elasticsearch

BookAug 2019560 pages

Mastering Elastic Stack

BookFeb 2017526 pages

Learning Elastic Stack 6.0

This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications. It provide in-depth coverage of the different components of the Elastic Stack, and how to use them all together.

BookDec 2017434 pages

Learning Elastic Stack 7.0

This book teaches you about every component of the Elastic Stack - including Elasticsearch, Kibana, Logstash, and X-pack - with new and the updated features that are released with the 7.0 version. With the help of this book, you will be able to develop enterprise-grade distributed search and analytics applications for your data without any hassle.

BookMay 2019474 pages

Learning Kibana 5.0

BookFeb 2017284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages