Reader small image

You're reading from  Solr Cookbook - Third Edition

Product typeBook
Published inJan 2015
Reading LevelIntermediate
Publisher
ISBN-139781783553150
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Rafal Kuc
Rafal Kuc
author image
Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc

Right arrow

Chapter 5. Faceting

In this chapter, we will cover the following topics:

  • Getting the number of documents with the same field value

  • Getting the number of documents with the same value range

  • Getting the number of documents matching the query and subquery

  • Removing filters from faceting results

  • Using decision tree faceting

  • Calculating faceting for relevant documents in groups

  • Improving faceting performance for low cardinality fields

Introduction


One of the advantages of Solr is its ability to calculate statistics from your data. Solr faceting mechanism provides functionalities that can help us in several tasks that we do every day. From getting the number of documents with the same values in a field (for example, companies from the same city) through the ability of date and range faceting, to the autocomplete features based on the faceting mechanism. This chapter will show you how to handle some of the common tasks when using the faceting mechanism.

Getting the number of documents with the same field value


Imagine a situation where you have to return the number of documents with the same field value besides the search results. For example, you have an application that allows your user to search for companies in Europe and your client wants to have the number of companies in the cities where the companies that were found by the query are located. To do this, you can of course run several queries, but Solr provides a mechanism called faceting that can do this for you. This recipe will show you how to use it.

How to do it...

  1. Let's start by assuming that we have the following fields present in the schema.xml file:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="city" type="string" indexed="true" stored="true" />
  2. The next step is to index the following example data:

    <add>
     <doc>
      <field name="id...

Getting the number of documents with the same value range


Imagine that you have an application where users can search the index to find a car for rent. One of the requirements of the application is to show a navigation panel, where the user can choose the price range for the cars they are interested in. To do this in an efficient way, we will use range faceting and this recipe will show you how to do it.

How to do it...

Let's begin with the following index structure:

  1. Add the following fields definition to our schema.xml file:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="price" type="float" indexed="true" stored="true" />
  2. The example data that we will use looks as follows:

    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Super Mazda</field>
      <field name="price">50</field>
     </doc>
     <doc>...

Getting the number of documents matching the query and subquery


Imagine a situation where you have an application that has a search feature for cars. One of the requirements is not only to show the search results, but also to show the number of cars with the price period chosen by the user. There is also another thing—these queries must be fast because of the number of queries that will be running. Can Solr handle this? The answer is yes. This recipe will show you how to do it.

How to do it...

Let's start with creating an index with a very simple index structure that looks as follows:

  1. Add the following definition to your schema.xml:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="price" type="float" indexed="true" stored="true" />
  2. Now, let's index the following sample data:

    <add>
     <doc>
      <field name="id">1</field>
      <field name="name"...

Removing filters from faceting results


Let's assume for the purpose of this recipe, you have an application that can search for companies within a city and a state. However, the requirements say that not only should you show the search results, but also the number of companies in each city and the number of companies in each state (to say in the Solr way—you want to exclude the filter query from the faceting results). Can Solr do this efficiently ? Sure it can, and this recipe will show you how to do it.

Getting ready

Before you start reading this recipe, let's take a look at the Getting the number of documents with the same field value recipe of this chapter.

How to do it...

  1. As usual we start with a very simple index structure that contains four fields. We do this by adding the following section to the schema.xml file:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name...

Using decision tree faceting


Imagine that in our store we have products divided into categories. In addition to this, we store information about the stock of the items. Now, we want to show our crew how many of the products in the categories are in stock and how many are missing. The first thing that comes to mind is to use the faceting mechanism and some additional calculation. But why bother, when Solr 4.0 and later can do that calculation for us with the use of so-called pivot faceting? This recipe will show you how to use it.

How to do it...

  1. We start with defining the index structure that we can easily use. We do this by adding the following field definitions to the schema.xml file:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="category" type="string" indexed="true" stored="true" />
    <field name="stock" type="boolean" indexed="true" stored="true" />...

Calculating faceting for relevant documents in groups


If you have ever used the field-collapsing functionality of Solr, you might be wondering whether there is a possibility of using that functionality and faceting. Of course, there is, but the default behavior still works and so you get the faceting calculation on the basis of documents and not on document groups. In this recipe, we will learn how to query Solr so that it returns facets calculated for the most relevant document in each group.

Getting ready

Before reading the following recipe, let's take a look at Grouping documents by the field value, Grouping documents by the query value, and Grouping documents by the function value recipes in Chapter 8, Using Additional Functionalities. Also if you are not familiar with faceting functionality, read the first three recipes of this chapter.

How to do it...

  1. In the first step, we need to create an index. For the purpose of this recipe, let's assume that we have the following index structure ...

Improving faceting performance for low cardinality fields


Although Solr faceting is very fast, there are times when the default configuration values are not as fast as they can be. There are a few cases where we can tune Solr faceting mechanism and make it work faster. This recipe will show you how to tune the faceting mechanism.

Getting ready

Before you start reading this recipe, take a look at the Getting the number of documents with the same field value recipe of this chapter.

How to do it...

For the purpose of this recipe, we will assume that we have the following index structure:

  1. Add the following section to your schema.xml file:

    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="tag" type="string" indexed="true" stored="true" />
  2. We've used the following bash script to index the data (note that we are indexing two million documents here and are sending them one by one. So it might take a long time to index the data):

    #!/bin/sh
    URL=http:/...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Solr Cookbook - Third Edition
Published in: Jan 2015Publisher: ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc