You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781801079815

Edition5th Edition

Tools

Elasticsearch Elasticsearch

Concepts

Enterprise Search

Author (1)

Alberto Paro

Mapping base types

Using explicit mapping makes it possible to start to quickly ingest the data using a schemaless approach without being concerned about field types. Thus, to achieve better results and performance in indexing, it's required to manually define a mapping.

Fine-tuning mapping brings some advantages, such as the following:

Reducing the index size on the disk (disabling functionalities for custom fields)
Indexing only interesting fields (general speed up)
Precooking data for fast search or real-time analytics (such as aggregations)
Correctly defining whether a field must be analyzed in multiple tokens or considered as a single token
Defining mapping types such as geo point, suggester, vectors, and so on

Elasticsearch allows you to use base fields with a wide range of configurations.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.

To execute this recipe's examples, you will need to create an index with a test name, where you can put mappings, as explained in the Using explicit mapping creation recipe.

How to do it...

Let's use a semi real-world example of a shop order for our eBay-like shop:

First, we must define an order:

Figure 2.1 – Example of an order

Our order record must be converted into an Elasticsearch mapping definition, as follows:

PUT test/_mapping
{  "properties" : {
      "id" : {"type" : "keyword"},
      "date" : {"type" : "date"},
      "customer_id" : {"type" : "keyword"},
      "sent" : {"type" : "boolean"},
      "name" : {"type" : "keyword"},
      "quantity" : {"type" : "integer"},
      "price" : {"type" : "double"},
      "vat" : {"type" : "double", "index": false}
} }

Now, the mapping is ready to be put in the index. We will learn how to do this in the Putting a mapping in an index recipe of Chapter 3, Basic Operations.

How it works...

Field types must be mapped to one of the Elasticsearch base types, and options on how the field must be indexed need to be added.

The following table is a reference for the mapping types:

Figure 2.2 – Base type mapping

Depending on the data type, it's possible to give explicit directives to Elasticsearch when you're processing the field for better management. The most used options are as follows:

store (default false): This marks the field to be stored in a separate index fragment for fast retrieval. Storing a field consumes disk space but reduces computation if you need to extract it from a document (that is, in scripting and aggregations). The possible values for this option are true and false. They are always retuned as an array of values for consistency.

The stored fields are faster than others in aggregations.

index: This defines whether or not the field should be indexed. The possible values for this parameter are true and false. Index fields are not searchable (the default is true).
null_value: This defines a default value if the field is null.
boost: This is used to change the importance of a field (the default is 1.0).

boost works on a term level only, so it's mainly used in term, terms, and match queries.

search_analyzer: This defines an analyzer to be used during the search. If it's not defined, the analyzer of the parent object is used (the default is null).
analyzer: This sets the default analyzer to be used (the default is null).
norms: This controls the Lucene norms. This parameter is used to score queries better. If the field is only used for filtering, it's a best practice to disable it to reduce resource usage (true for analyzed fields and false for not_analyzed ones).
copy_to: This allows you to copy the content of a field to another one to achieve functionalities, similar to the _all field.
ignore_above: This allows you to skip the indexing string if it's bigger than its value. This is useful for processing fields for exact filtering, aggregations, and sorting. It also prevents a single term token from becoming too big and prevents errors due to the Lucene term's byte-length limit of 32,766. The maximum suggested value is 8191 (https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html).

There's more...

From Elasticsearch version 6.x onward, as shown in the Using explicit mapping creation recipe, the explicit inferred type for a string is a multifield mapping:

The default processing is text. This mapping allows textual queries (that is, term, match, and span queries). In the example provided in the Using explicit mapping creation recipe, this was name.
The keyword subfield is used for keyword mapping. This field can be used for exact term matching and aggregation and sorting. In the example provided in the Using explicit mapping creation recipe, the referred field was name.keyword.

Another important parameter, available only for text mapping, is term_vector (the vector of terms that compose a string). Please refer to the Lucene documentation for further details at https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/Terms.html.

term_vector can accept the following values:

no: This is the default value; that is, skip term vector.
yes: This is the store term vector.
with_offsets: This is the store term vector with a token offset (start, end position in a block of characters).
with_positions: This is used to store the position of the token in the term vector.
with_positions_offsets: This stores all the term vector data.
with_positions_payloads: This is used to store the position and payloads of the token in the term vector.
with_positions_offsets_payloads: This stores all the term vector data with payloads.

Term vectors allow fast highlighting but consume disk space due to storing additional text information. It's a best practice to only activate it in fields that require highlighting, such as title or document content.

The online documentation on Elasticsearch provides a full description of all the properties for the different mapping fields at https://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-params.html.
The Specifying a different analyzer recipe at the end of this chapter shows alternative analyzers to the standard one.
For newcomers who want to explore the concepts of tokenization, I would suggest reading the official Elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.

You have been reading a chapter from

Elasticsearch 8.x Cookbook - Fifth Edition

Published in: May 2022Publisher: PacktISBN-13: 9781801079815

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Elasticsearch 8.x Cookbook - Fifth Edition

Mapping base types

Getting ready

How to do it...

How it works...

There's more...

See also

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook