Reader small image

You're reading from  Advanced Elasticsearch 7.0

Product typeBook
Published inAug 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789957754
Edition1st Edition
Languages
Right arrow
Author (1)
Wai Tak Wong
Wai Tak Wong
author image
Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong

Right arrow

Mapping APIs

In the last chapter, we learned about the document life cycle of Elasticsearch and executed single and multiple document APIs. We introduced a commission-free ETF from TD Ameritrade and used this information to practice document APIs. We also reindexed the documents from the multiple mapping types index for migration.

In this chapter, we are going to learn about mapping APIs. In Elasticsearch, mapping is a data model that describes the structure of a document. It allows you to specify fields, field types, relationships between documents, data conversion rules, and so on. Schema-less only means that documents can be indexed without specifying the schema in advance, because the schema is dynamically derived from the first document index structure based on the built-in mapping rules in Elasticsearch. If you have a good search database design plan, you should use explicit...

Dynamic mapping

The official definition of dynamic mapping is to detect the data types of new fields of the documents immediately during indexing, instead of first creating an index, defining the fields, and establishing its datatype mapping. Therefore, document indexing operations can be executed the fly without predefining the field mapping. Newly added fields can be picked up during indexing at any time. We'll first check with the mapping rules and then practice with a sample document to see the indexing result.

Mapping rules

The Elasticsearch document is in JavaScript Object Notation (JSON) format. In JSON, the valid datatypes are string, number, JSON object, array, Boolean, and null. The JSON data value and the mapping...

Meta fields in mapping

Some meta fields can be defined in the mapping to customize the indexing operation. We'll list some of the important ones and describe their usage:

  • _meta: This is for user-defined metadata. Developers can use it to store the application-specific metadata.
  • _routing: You can configure the routing value in the mapping to require all subsequent related operations to have the routing value specified.
  • _source: By default, _source is enabled and the source document is stored. If the _source is disabled by a user, then the source document is not retained. The benefit to this is that it saves some storage space. The drawback is that there are no more supports for the document to reindex, update, and highlight when executing a search. Think carefully before deciding.

_meta and _routing can be updated, but _source cannot be updated. Let us use the following...

Field datatypes

In addition to the mapped types we learned in the previous mapping rules sections, there are still many other datatypes to map for the fields in the document. Field datatypes are defined in the static mapping. In the Dynamic mapping section, we have shown the data structure from the response of the GET mapping request of default_mappings_index. The syntax to set the datatype of a field in static mapping is as follows:

"properties": {
"field_1": {"type": "field_type_1"},
"field_2": {"type": "field_type_2", "mapping_parameter2a":"parameter2a_value",..}
}

Let us examine the description and the example usage for each mapping datatype in the following table:

Mapping datatype

Description

Example

Text

Full text value analyzed...

Mapping parameters

The mapping parameters define the ways to store the document fields, the methodologies and options to index, the information of the fields to expose, the additional procedure to perform, and how the mapped data is analyzed during index and search. We will endeavor to explain all the supported mapping parameters in the following tables:

Mapping parameter

Description

Example

How it stores

store

Whether to store the field value. The default value is false.

"field_1":{"type":"type_1", "store":true}

doc_values

It stores the same values as _source, but in a column-oriented manner. It is designed for sorting and aggregations. The default value is true.

"field_1":{"type":"text", "doc_values":false}

term_vector

Whether to store the information of the terms...

Refreshing mapping changes for static mapping

In Chapter 3, Document APIs, we introduced the Update by Query API for documents and also for static mappings. If static mapping is used, any fields not specified in the mappings will not be indexed and searchable; however, they are stored in _source. If we need to search for a non-mapping field later, we only need to update the static mappings to make the indexed documents take effect of the changes. No document reindexing step is required. Assume that the original mappings for our sample documents do not have the exchange field, that the document has been indexed, and that we now want it to be searchable:

  1. First, we need to update the mapping, as shown in the following screenshot:
  1. The next step is to issue an Update by Query request to take the static mapping changes into effect, as shown in the following screenshot:

In the...

Typeless APIs working with old custom index types

Prior to version 7.0, indexes could contain a custom type. In version 6.x, only a single index type is allowed for an index. In version 6.7, a dummy index type, _doc, is used if no index type is specified. However, APIs in 7.0 are typeless but the type name _doc is still valid. A new include_type_name URL parameter to support the old custom type index is introduced in version 6.7.

The default value is true since the type name is still required in version 6.7. In version 7.0, the default value is false. To work with the old custom index type, you can attach the include_type_name=false URL parameter. You can skip the custom type in the URL since the typeless GET API is used. An example of retrieving index mappings without specifying the index type in the URL is shown in the following screenshot:

In the response body, no index...

Summary

Time flies! We have completed the chapter. Now, we understand mappings and know how to index documents. We have performed mapping APIs and created our own mapping design to index the sample documents. We also dealt with the remaining task from Chapter 3, Document APIs, to refresh the mapping changes of static mapping in order to avoid document reindexing.

In the next chapter, we'll delve into the analyzer. With a custom analyzer, you can define how the document fields behave before storing or during search time later. You have great control over how document fields are used in your queries to make your search more accurate and efficient. We'll first learn the tokenizers, and then the filters. Then, after reviewing the built-in analyzers, we'll write our own analyzer.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Elasticsearch 7.0
Published in: Aug 2019Publisher: PacktISBN-13: 9781789957754
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Wai Tak Wong

Wai Tak Wong is a faculty member in the Department of Computer Science at Kean University, NJ, USA. He has more than 15 years professional experience in cloud software design and development. His PhD in computer science was obtained at NJIT, NJ, USA. Wai Tak has served as an associate professor in the Information Management Department of Chung Hua University, Taiwan. A co-founder of Shanghai Shellshellfish Information Technology, Wai Tak acted as the Chief Scientist of the R&D team, and he has published more than a dozen algorithms in prestigious journals and conferences. Wai Tak began his search and analytics technology career with Elasticsearch in the real estate market and later applied this to data management and FinTech data services.
Read more about Wai Tak Wong