Packt+ | Advance your knowledge in tech

You're reading from Solr Cookbook - Third Edition

Product typeBook

Published inJan 2015

Reading LevelIntermediate

Publisher

ISBN-139781783553150

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Author (1)

Rafal Kuc

Chapter 8. Using Additional Functionalities

In this chapter, we will cover the following topics:

Finding similar documents
Highlighting fragments found in documents
Efficient highlighting
Using versioning
Retrieving information about the index structure
Altering the index structure on a live collection
Grouping documents by the field value
Grouping documents by the query value
Grouping documents by the function value
Efficient documents grouping using the post filter

Introduction

There are many features of Solr that we don't use every day. Highlighting words, word ignoring, or statistics computation might not be useful in day-to-day activities, but they can come in handy in many situations. In this chapter, I'll try to show you how to overcome some typical problems that can be fixed using some of the Solr functionalities. In addition to this, we will see how to use the Solr grouping mechanism in order to get documents that have some fields in common.

Finding similar documents

Imagine a situation where you want to show documents similar to those that were returned by Solr. For example, let's assume that we have an e-commerce library, and we want to show users similar books to the ones that they found while using your application. Of course, we can use machine learning and one of the collaborative filtering algorithms, but we can also use Solr for that. This recipe will show you how to do this.

How to do it...

Let's start with the following index structure (just add this to your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" termVectors="true" />

Next, let's index the following test data:

<add>
 <doc>
  <field name="id">1</field>
  <field name="name">Solr Cookbook first edition</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="name">...

Highlighting fragments found in documents

Imagine a situation where you want show your users the words that were matching from the document, which were shown in the results list. For example, you want to show which words in the book name were matched and displayed to the user. Do you have to store the documents and do the matching on the application side? The answer is no—we can force Solr to do this for us, and this recipe will show you how to do this.

How to do it...

We will begin with creating the following index structure (just add the following fields to your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" />

For the purpose of this recipe, we will use the following test data:

<add>
 <doc>
  <field name="id">1</field>
  <field name="name">Solr Cookbook first edition</field>
 </doc>
 <doc>
  <field name="id">...

Efficient highlighting

In certain situations, the standard highlighting mechanism might not be performing as you would like it to be. For example, you might have long text fields and want the highlighting mechanism to work with them. In such cases, there is a need of another, more efficient highlighter. Thankfully, there is a highlighter that is very efficient, and this recipe will show you how to use it.

How to do it...

We begin with the index structure configuration, which looks as follows (just add the following section to your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"  termVectors="true" termPositions="true" termOffsets="true" />

The next step is to index the data. We will use the following test data for the purpose of this recipe:
```
<add>
 <doc>
  <field name="id">1</field>
  <field name="name">Solr Cookbook first edition</field...
```

Using versioning

When working with NoSQL solutions such as Solr, we usually don't have the notion of transaction and we can't predict the sequence in which documents will be received by Solr and indexed especially when indexing is done from multiple threads and machines. However, in certain cases, such a functionality is needed at least to some degree. For example, we don't want to run an update on a document that was updated between the time period we read the document and sent the update. This recipe will show you how to avoid such situations.

Getting ready

This recipe uses the functionality discussed in the Updating document fields recipe from Chapter 2, Indexing Your Data. Read this recipe before proceeding.

How to do it...

For the purpose of this recipe, we assume that we have an e-commerce library. When updating prices of the books, we need to read the document to get the current price, update it in the UI, and index the document. However, it can happen that the same book is being updated...

Retrieving information about the index structure

Until Solr 4.2, we had to look at the schema.xml file to see the full structure of the document. With the release of Solr 4.2, we got the ability to use the so-called Schema API to read the schema of collections that are running inside the cluster or on a node. In this recipe, we will take a look at the possibilities of reading the Solr schema.

How to do it...

The actual schema.xml file that we will use for reading doesn't really matter, as we will not focus on the actual index structure, but the API and how to get the particular information from Solr.

Note

We assume that we are using a collection named cookbook.

We will start with retrieving all the fields defined in our schema.xml file. To do this, we will run the following query:
```
http://localhost:8983/solr/cookbook/schema/fields
```
The response to the preceding command will be as follows:
```
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "fields":[{
      "name":"_version_",
      "type"...
```

Altering the index structure on a live collection

The ability to push a new index structure definition (the schema.xml file) to ZooKeeper is nice, but it requires the collection to be reloaded. The same goes for Solr, when it works in noncloud mode, we need to reload a core for Solr to see the changes. This is also not super convenient when you would like to change the index structure from outside Solr. That is why with the release of Solr 4.3, the Schema API allows you to alter the index structure using simple HTTP-based requests. In this recipe, we will take a look at how to use the Schema API to alter our index structure.

Getting ready

Before continuing with the recipe, read the Retrieving information about the index structure recipe discussed earlier in this chapter, as it provides information on how to read index structure information using Solr Schema API.

How to do it...

For the purpose of this recipe, let's assume that we have a very basic index structure that we want to add a field...

Grouping documents by the field value

Imagine a situation where your dataset is divided into different categories, subcategories, price ranges, and things like that. What if you would like to not only get information about counts in such groups (with the use of faceting), but you would only like to show the most relevant document in each of the groups. In such cases, a Solr grouping mechanism comes in handy. This recipe will show you how to group your documents on the basis of the value of the field.

How to do it...

Let's start with the index structure. Let's assume that we have the following fields in our index (just add the following section to the schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="category" type="string" indexed="true" stored="true" />
<field name="price" type="tfloat" indexed="true" stored="true" />

The example data, which...

Grouping documents by the query value

Sometimes, grouping results on the basis of field values is not enough. For example, imagine that we would like to group documents in some kind of price brackets—show the most relevant document for documents with the price range of 1.0 to 19.99, a document for documents with the price range of 20.00 to 50.0, and so on. Solr allows you to group results on the basis of queries results. This recipe will show you how to do that.

Getting ready

In this chapter, we will use the same index structure and test data as we used in the Grouping documents by the field value recipe in this chapter. Read it before we continue.

How to do it…

Because we are reusing the data and index structure from the Grouping documents by the field value recipe, we can start with the query. In order to group our documents on the basis of query results, we can send the following query:
```
http://localhost:8983/solr/cookbook/select?q=*:*&group=true&group.query=price:[20.0+TO+50.0]&...
```

Grouping documents by the function value

Imagine that you would like to group results not by using queries or field contents, but instead you would like to use a value returned by a function query. An example use case can be grouping documents on the basis of their distance from a point. Sounds nice right? Solr allows that and in this recipe, we will see how we can use a simple function query to group results.

Getting ready

In this recipe, we will use the knowledge that we've gained in the Grouping documents by the field value recipe in this chapter. Read the mentioned recipe before we continue.

How to do it...

Let's start with the following index structure (just add the following fields definition to your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="geo" type="location" indexed="true" stored="true" />
<dynamicField name="*_coordinate"  type...

Efficient documents grouping using the post filter

Sometimes, the standard field collapsing provided by Solr is not enough when it comes to performance. This is especially true when we want to perform field collapsing on fields that will result in large number of unique groups in the results, so mostly, for high cardinality fields. For such use cases, Solr provides an efficient post filter approach of field collapsing, and this recipe will show you how to use that approach.

Getting ready

In this chapter, we will use the same index structure and test data as we used in the Grouping documents by the field value recipe of this chapter. Read it before we continue.

How to do it...

Let's start with the index structure. Let's assume that we have the following fields in our index (just add the following section to the schema.xml files):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" />
<...

The rest of the chapter is locked

You have been reading a chapter from

Solr Cookbook - Third Edition

Published in: Jan 2015Publisher: ISBN-13: 9781783553150

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages