Packt+ | Advance your knowledge in tech

You're reading from Elasticsearch Server: Second Edition

Product type Book

Published in Apr 2014

Publisher

ISBN-13 9781783980529

Pages 428 pages

Edition 1st Edition

Languages

Java

Concepts

Enterprise Search

Table of Contents (18) Chapters

Elasticsearch Server Second Edition

Credits

About the Author

Acknowledgments

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Getting Started with the Elasticsearch Cluster

Indexing Your Data

Searching Your Data

Extending Your Index Structure

Make Your Search Better

Beyond Full-text Searching

Elasticsearch Cluster in Detail

Administrating Your Cluster

Index

Chapter 4. Extending Your Index Structure

In the previous chapter, we learned many things about querying Elasticsearch. We saw how to choose fields that will be returned and learned how querying works in Elasticsearch. In addition to that, we now know the basic queries that are available and how to filter our data. What's more, we saw how to highlight the matches in our documents and how to validate our queries. In the end, we saw the compound queries of Elasticsearch and learned how to sort our data. By the end of this chapter, you will have learned the following topics:

Indexing tree-like structured data
Indexing data that is not flat
Modifying your index structure when possible
Indexing data with relationships by using nested documents
Indexing data with relationships between them by using the parent-child functionality

Indexing tree-like structures

Trees are everywhere. If you develop a shop application, you would probably have categories. If you look at the filesystem, the files and directories are arranged in tree-like structures. This book can also be represented as a tree: chapters contain topics and topics are divided into subtopics. As you can imagine, Elasticsearch is also capable of indexing tree-like structures. Let's check how we can navigate through this type of data using path_analyzer.

Data structure

First, let's create a simple index structure by using the following lines of code:

curl -XPUT 'localhost:9200/path' -d '{
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "path_analyzer" : { "tokenizer" : "path_hierarchy" }
        }
      }
    }
  },
  "mappings" : {
    "category" : {
      "properties" : {
        "category" : {
          "type" : "string",
          "fields" : {
            "name" : { "type" : "string","index" : "not_analyzed" },
      ...

Indexing data that is not flat

Not all data is flat like the data we have been using so far in this book. Of course, if we are building the system that Elasticsearch will be a part of, we can create a structure that is convenient for Elasticsearch. Of course, the structure can't always be flat, because not all use cases allow that. Let's see how to create mappings that use fully-structured JSON objects.

Data

Let's assume that we have the following data (we will store it in the file named structured_data.json):

{
  "book" : {
    "author" : {
      "name" : {
        "firstName" : "Fyodor",
        "lastName" : "Dostoevsky"
      }
    },
    "isbn" : "123456789",
    "englishTitle" : "Crime and Punishment",
    "year" : 1886,
    "characters" : [
      {
        "name" : "Raskolnikov"
      }, 
      {
        "name" : "Sofia"
      }
    ],
    "copies" : 0
  }
}

As you can see in the preceding code, the data is not flat; it contains arrays and nested objects. If we would like to create mappings...

Using nested objects

Nested objects can come in handy in certain situations. Basically, with nested objects, Elasticsearch allows us to connect multiple documents together—one main document and multiple dependent ones. The main document and the nested ones will be indexed together and they will be placed in the same segment of the index (actually, in the same block), which guarantees the best performance we can get for data structure. The same goes for changing the document; unless you are using the update API, you need to index the parent document and all the other nested documents at the same time.

Note

If you would like to read more about how nested objects work on the Lucene level, there is a very good blog post by Mike McCandless at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.

Now, let's get to our example use case. Imagine that we have a shop with clothes and we store the size and color of each t-shirt. Our standard, nonnested mappings will look similar...

Using the parent-child relationship

In the previous section, we discussed the ability to index nested documents along with the parent one. However, even though the nested documents are indexed as separate documents in the index, we can't change a single nested document (unless we use the update API). However, Elasticsearch allows us to have a real parent-child relationship and we will look at it in the following section.

Index structure and data indexing

Let's use the same example that we used when discussing the nested documents—the hypothetical cloth store. However, what we would like to have is the ability to update sizes and colors without the need to index the whole document after each change.

Parent mappings

The only field we need to have in our parent document is name. We don't need anything more than that. So, in order to create our cloth type in the shop index, we will run the following commands:

curl -XPOST 'localhost:9200/shop'
curl -XPUT 'localhost:9200/shop/cloth/_mapping' -d '...

Modifying your index structure with the update API

In the previous chapters, we discussed how to create index mappings and index the data. But what if you already have the mappings created and data indexed, but want to modify the structure of the index? This is possible to some extent. For example, by default, if we index a document with a new field, Elasticsearch will add that field to the index structure. Let's now look at how to modify the index structure manually.

The mappings

Let's assume that we have the following mappings for our users index stored in the user.json file:

{
  "user" : {
    "properties" : {
      "name" : {"type" : "string"}
    }
  }
}

As you can see, it is very simple. It just has a single property that will hold the username. Now, let's create an index called users, and use the previous mappings to create our own type. To do that, we will run the following commands:

curl -XPOST 'localhost:9200/users'
curl -XPUT 'localhost:9200/users/user/_mapping' -d @user.json

If everything...

Summary

In this chapter, we learned how to index tree-like structures using Elasticsearch. In addition to that, we indexed data that is not flat and modified the structure of already-created indices. Finally, we learned how to handle relationships by using nested documents and by using the Elasticsearch parent-child functionality.

In the next chapter, we'll focus on making our search even better. We will see how Apache Lucene scoring works and why it matters so much. We will learn how to use the Elasticsearch function-score query to adjust the importance of our documents using different functions and we'll leverage the provided scripting capabilities. We will search the content in different languages and discuss when index time-boosting makes sense. We'll use synonyms to match words with the same meaning and we'll learn how to check why a given document was found by a query. Finally, we'll influence queries with boosts, and we will learn how to understand the score calculation done by Elasticsearch...