Chapter 4. Extending Your Index Structure
In the previous chapter, we learned many things about querying Elasticsearch. We saw how to choose fields that will be returned and learned how querying works in Elasticsearch. In addition to that, we now know the basic queries that are available and how to filter our data. What's more, we saw how to highlight the matches in our documents and how to validate our queries. In the end, we saw the compound queries of Elasticsearch and learned how to sort our data. By the end of this chapter, you will have learned the following topics:
Indexing tree-like structured data
Indexing data that is not flat
Modifying your index structure when possible
Indexing data with relationships by using nested documents
Indexing data with relationships between them by using the parent-child functionality
Indexing tree-like structures
Trees are everywhere. If you develop a shop application, you would probably have categories. If you look at the filesystem, the files and directories are arranged in tree-like structures. This book can also be represented as a tree: chapters contain topics and topics are divided into subtopics. As you can imagine, Elasticsearch is also capable of indexing tree-like structures. Let's check how we can navigate through this type of data using path_analyzer
.
First, let's create a simple index structure by using the following lines of code:
Indexing data that is not flat
Not all data is flat like the data we have been using so far in this book. Of course, if we are building the system that Elasticsearch will be a part of, we can create a structure that is convenient for Elasticsearch. Of course, the structure can't always be flat, because not all use cases allow that. Let's see how to create mappings that use fully-structured JSON objects.
Let's assume that we have the following data (we will store it in the file named structured_data.json
):
As you can see in the preceding code, the data is not flat; it contains arrays and nested objects. If we would like to create mappings...
Nested objects can come in handy in certain situations. Basically, with nested objects, Elasticsearch allows us to connect multiple documents together—one main document and multiple dependent ones. The main document and the nested ones will be indexed together and they will be placed in the same segment of the index (actually, in the same block), which guarantees the best performance we can get for data structure. The same goes for changing the document; unless you are using the update API, you need to index the parent document and all the other nested documents at the same time.
Now, let's get to our example use case. Imagine that we have a shop with clothes and we store the size and color of each t-shirt. Our standard, nonnested mappings will look similar...
Using the parent-child relationship
In the previous section, we discussed the ability to index nested documents along with the parent one. However, even though the nested documents are indexed as separate documents in the index, we can't change a single nested document (unless we use the update API). However, Elasticsearch allows us to have a real parent-child relationship and we will look at it in the following section.
Index structure and data indexing
Let's use the same example that we used when discussing the nested documents—the hypothetical cloth store. However, what we would like to have is the ability to update sizes and colors without the need to index the whole document after each change.
The only field we need to have in our parent document is name
. We don't need anything more than that. So, in order to create our cloth
type in the shop
index, we will run the following commands:
Modifying your index structure with the update API
In the previous chapters, we discussed how to create index mappings and index the data. But what if you already have the mappings created and data indexed, but want to modify the structure of the index? This is possible to some extent. For example, by default, if we index a document with a new field, Elasticsearch will add that field to the index structure. Let's now look at how to modify the index structure manually.
Let's assume that we have the following mappings for our users
index stored in the user.json
file:
As you can see, it is very simple. It just has a single property that will hold the username. Now, let's create an index called users
, and use the previous mappings to create our own type. To do that, we will run the following commands:
If everything...
In this chapter, we learned how to index tree-like structures using Elasticsearch. In addition to that, we indexed data that is not flat and modified the structure of already-created indices. Finally, we learned how to handle relationships by using nested documents and by using the Elasticsearch parent-child functionality.
In the next chapter, we'll focus on making our search even better. We will see how Apache Lucene scoring works and why it matters so much. We will learn how to use the Elasticsearch function-score query to adjust the importance of our documents using different functions and we'll leverage the provided scripting capabilities. We will search the content in different languages and discuss when index time-boosting makes sense. We'll use synonyms to match words with the same meaning and we'll learn how to check why a given document was found by a query. Finally, we'll influence queries with boosts, and we will learn how to understand the score calculation done by Elasticsearch...