Reader small image

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781801079815
Edition5th Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Mapping base types

Using explicit mapping makes it possible to start to quickly ingest the data using a schemaless approach without being concerned about field types. Thus, to achieve better results and performance in indexing, it's required to manually define a mapping.

Fine-tuning mapping brings some advantages, such as the following:

  • Reducing the index size on the disk (disabling functionalities for custom fields)
  • Indexing only interesting fields (general speed up)
  • Precooking data for fast search or real-time analytics (such as aggregations)
  • Correctly defining whether a field must be analyzed in multiple tokens or considered as a single token
  • Defining mapping types such as geo point, suggester, vectors, and so on

Elasticsearch allows you to use base fields with a wide range of configurations.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.

To execute this recipe's examples, you will need to create an index with a test name, where you can put mappings, as explained in the Using explicit mapping creation recipe.

How to do it...

Let's use a semi real-world example of a shop order for our eBay-like shop:

  1. First, we must define an order:
Figure 2.1 – Example of an order

Figure 2.1 – Example of an order

  1. Our order record must be converted into an Elasticsearch mapping definition, as follows:
    PUT test/_mapping
    {  "properties" : {
          "id" : {"type" : "keyword"},
          "date" : {"type" : "date"},
          "customer_id" : {"type" : "keyword"},
          "sent" : {"type" : "boolean"},
          "name" : {"type" : "keyword"},
          "quantity" : {"type" : "integer"},
          "price" : {"type" : "double"},
          "vat" : {"type" : "double", "index": false}
    } }

Now, the mapping is ready to be put in the index. We will learn how to do this in the Putting a mapping in an index recipe of Chapter 3, Basic Operations.

How it works...

Field types must be mapped to one of the Elasticsearch base types, and options on how the field must be indexed need to be added.

The following table is a reference for the mapping types:

Figure 2.2 – Base type mapping

Figure 2.2 – Base type mapping

Depending on the data type, it's possible to give explicit directives to Elasticsearch when you're processing the field for better management. The most used options are as follows:

  • store (default false): This marks the field to be stored in a separate index fragment for fast retrieval. Storing a field consumes disk space but reduces computation if you need to extract it from a document (that is, in scripting and aggregations). The possible values for this option are true and false. They are always retuned as an array of values for consistency.

The stored fields are faster than others in aggregations.

  • index: This defines whether or not the field should be indexed. The possible values for this parameter are true and false. Index fields are not searchable (the default is true).
  • null_value: This defines a default value if the field is null.
  • boost: This is used to change the importance of a field (the default is 1.0).

boost works on a term level only, so it's mainly used in term, terms, and match queries.

  • search_analyzer: This defines an analyzer to be used during the search. If it's not defined, the analyzer of the parent object is used (the default is null).
  • analyzer: This sets the default analyzer to be used (the default is null).
  • norms: This controls the Lucene norms. This parameter is used to score queries better. If the field is only used for filtering, it's a best practice to disable it to reduce resource usage (true for analyzed fields and false for not_analyzed ones).
  • copy_to: This allows you to copy the content of a field to another one to achieve functionalities, similar to the _all field.
  • ignore_above: This allows you to skip the indexing string if it's bigger than its value. This is useful for processing fields for exact filtering, aggregations, and sorting. It also prevents a single term token from becoming too big and prevents errors due to the Lucene term's byte-length limit of 32,766. The maximum suggested value is 8191 (https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html).

There's more...

From Elasticsearch version 6.x onward, as shown in the Using explicit mapping creation recipe, the explicit inferred type for a string is a multifield mapping:

  • The default processing is text. This mapping allows textual queries (that is, term, match, and span queries). In the example provided in the Using explicit mapping creation recipe, this was name.
  • The keyword subfield is used for keyword mapping. This field can be used for exact term matching and aggregation and sorting. In the example provided in the Using explicit mapping creation recipe, the referred field was name.keyword.

Another important parameter, available only for text mapping, is term_vector (the vector of terms that compose a string). Please refer to the Lucene documentation for further details at https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/Terms.html.

term_vector can accept the following values:

  • no: This is the default value; that is, skip term vector.
  • yes: This is the store term vector.
  • with_offsets: This is the store term vector with a token offset (start, end position in a block of characters).
  • with_positions: This is used to store the position of the token in the term vector.
  • with_positions_offsets: This stores all the term vector data.
  • with_positions_payloads: This is used to store the position and payloads of the token in the term vector.
  • with_positions_offsets_payloads: This stores all the term vector data with payloads.

Term vectors allow fast highlighting but consume disk space due to storing additional text information. It's a best practice to only activate it in fields that require highlighting, such as title or document content.

See also

You can refer to the following sources for further details on the concepts of this chapter:

  • The online documentation on Elasticsearch provides a full description of all the properties for the different mapping fields at https://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-params.html.
  • The Specifying a different analyzer recipe at the end of this chapter shows alternative analyzers to the standard one.
  • For newcomers who want to explore the concepts of tokenization, I would suggest reading the official Elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.
Previous PageNext Page
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022Publisher: PacktISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro