Reader small image

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781801079815
Edition5th Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Managing a child document with a join field

In the previous recipe, we saw how it's possible to manage relationships between objects with the nested object type. The disadvantage of nested objects is their dependence on their parents. If you need to change the value of a nested object, you need to reindex the parent (this causes a potential performance overhead if the nested objects change too quickly). To solve this problem, Elasticsearch allows you to define child documents.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1Getting Started.

To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.

How to do it…

In the following example, we have two related objects: an Order and an Item.

Their UML representation is as follows:

Figure 2.3 – UML example of an Order/Item relationship

Figure 2.3 – UML example of an Order/Item relationship

The final mapping should merge the field definitions of both Order and Item, as well as use a special field (join_field, in this example) that takes the parent/child relationship.

To use join_field, follow these steps:

  1. First, we must define the mapping, as follows:
    PUT test1/_mapping
    { "properties": {
        "join_field": {
          "type": "join", "relations": { "order": "item" }
        },
        "id": { "type": "keyword" },
        "date": { "type": "date" },
        "customer_id": { "type": "keyword" },
        "sent": { "type": "boolean" },
        "name": { "type": "text" },
        "quantity": { "type": "integer" },
        "vat": { "type": "double" }
    } }

The preceding mapping is very similar to the one in the previous recipe.

  1. If we want to store the joined records, we will need to save the parent first and then the children, like so:
    PUT test/_doc/1?refresh
    { "id": "1", "date": "2018-11-16T20:07:45Z", "customer_id": "100", "sent": true, "join_field": "order" }
    PUT test/_doc/c1?routing=1&refresh
     { "name": "tshirt", "quantity": 10, "price": 4.3, "vat": 8.5,
       "join_field": { "name": "item", "parent": "1" } }

The child item requires special management because we need to add routing with the parent (1 in the preceding example). Furthermore, we need to specify the parent name and its ID in the object.

How it works…

Mapping, in the case of multiple item relationships in the same index, needs to be computed as the sum of all the other mapping fields.

The relationship between objects must be defined in join_field.

There must only be a single join_field for mapping; if you need to provide a lot of relationships, you can provide them in the relations object.

The child document must be indexed in the same shard as the parent; so, when indexed, an extra parameter must be passed, which is routing (we'll learn how to do this in the Indexing a document recipe in Chapter 3, Basic Operations).

A child document doesn't need to reindex the parent document when we want to change its values. Consequently, it's fast in terms of indexing, reindexing (updating), and deleting.

There's more...

In Elasticsearch, we have different ways to manage relationships between objects, as follows:

  • Embedding with type=object: This is implicitly managed by Elasticsearch and it considers the embedding as part of the main document. It's fast, but you need to reindex the main document to change the value of the embedded object.
  • Nesting with type=nested: This allows you to accurately search and filter the parent by using nested queries on children. Everything works for the embedded object except for the query (you must use a nested query to search for them).
  • External children documents: Here, the children are the external document, with a join_field property to bind them to the parent. They must be indexed in the same shard as the parent. The join with the parent is a bit slower than the nested one. This is because the nested objects are in the same data block as the parent in the Lucene index and they are loaded with the parent; otherwise, the child document requires more read operations.

Choosing how to model the relationship between objects depends on your application scenario.

Tip

There is also another approach that can be used, but on big data documents, it creates poor performance – decoupling a join relationship. You do the join query in two steps: first, collect the ID of the children/other documents and then search for them in a field of their parent.

See also

Please refer to the Using the has_child query, Using the top_children query, and Using the has_parent query recipes of Chapter 6, Relationships and Geo Queries, for more details on child/parent queries.

Previous PageNext Page
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022Publisher: PacktISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro