Packt+ | Advance your knowledge in tech

You're reading from Solr Cookbook - Third Edition

Product typeBook

Published inJan 2015

Reading LevelIntermediate

Publisher

ISBN-139781783553150

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Author (1)

Rafal Kuc

Chapter 4. Querying Solr

In this chapter, we will cover the following topics:

Understanding and using the Lucene query language
Using position-aware queries
Using boosting with autocomplete
Phrase queries with shingles
Handling user queries without errors
Handling hierarchies with nested documents
Sorting data on the basis of a function value
Controlling the number of terms needed to match
Affecting document score using function queries
Using simple nested queries
Using the Solr document's query join functionality
Handling typos with n-grams
Rescoring query results

Introduction

Creating a simple query is not a hard task, but creating a complex one, with faceting, local params, parameter dereferencing, and phrase queries can be a challenging task. Other than this, you must remember to write your query while keeping the performance factors in mind. This is why something that is simple at first sight can turn into something more challenging, such as writing a good, complex query. This chapter will try to guide you through some of the tasks you might encounter during your everyday work with Solr.

Understanding and using the Lucene query language

As you know, Solr is built using the Apache Lucene library. Due to this, some of the query parsers available in Solr allow us to fully leverage the query language of Lucene, giving us great flexibility to understand how our queries work and with what documents they match. In this recipe, we will discuss an example usage of the Lucene query language by looking at a book search site that gives its users the possibility of defining complex Boolean queries that contain phrases.

How to do it...

Let's perform the following steps to achieve this:

The first step is to prepare our index to handle data. To do this, we add the following entries to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true" />
<field name="published" type="int" indexed...

Using position aware queries

Most of the queries exposed by Lucene and Solr are not position-aware, which means that the query doesn't care about the place in the document where the word comes from. Of course, we have phrase queries that we can use for phrase searching, and even introduce the phrase slop, but this is not always enough. Sometimes, we might want to search for words with their positions in the searched documents. Let's assume that we allow our users to search for book titles and descriptions and specify how these words should be positioned related to each other. Solr provides us with such functionalities, and this recipe will show you how to use them.

How to do it...

Let's start with a simple index structure. For the purpose of this recipe, we will use the following fields:

Add the following sections to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true...

Using boosting with autocomplete

Autocomplete is very good when it comes to our user search experience. It is especially useful for showing users' data that we want to promote or the data that is of value to the users. In general, in e-commerce, the deployment of the autocomplete functionality means more profit. However, there are situations where we want to promote certain products or documents, for example, the currently top-selling books or financial reports, which are the most important ones. This recipe will show you how to boost certain documents when using the n-gram-based autocomplete functionality.

How to do it...

Let's perform the following steps to boost certain documents using the n-gram-based autocomplete function:

We start with creating the index structure for our use case; we just put the following section to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="text_general" indexed="true" stored...

Phrase queries with shingles

Imagine that you have an application that searches within millions of documents that are generated by a law company. One of the requirements is to search boost the documents that have either a search phrase or part of the phrase in their title. So, is it possible to achieve it using Solr? Yes, and this recipe will show you how to do this.

How to do it...

Let's follow these steps to achieve this:

Let's start with our index structure; we configure it by adding the following section to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true" />

The second step is to create example data that looks like this:

<doc>
  <field name="id">1</field>
  <field name="title">Financial report 2014</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="title">Financial marketing report 2014...

Handling user queries without errors

When building an application that uses Solr, we usually pass the query that the user sent to Solr. Sometimes, we even allow users to send complex queries that contain Lucene special characters. Due to this, there are situations where the user provides malformed queries, and thus, Solr throws an exception when running such queries. We can alter this behavior by using a new query parser called Simple. This recipe will show you how to do this.

Getting ready

Before continuing to read this recipe, I suggest reading the Understanding and using the Lucene query language recipe from this chapter.

How to do it...

Let's look into how to handle user queries without errors using the following steps:

We start by creating a simple index structure that will allow us to easily illustrate the example. To do this, we place the following section in the schema.xml file:
```
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title"...
```

Handling hierarchies with nested documents

In the real world, data is not flat, it contains many hierarchies that we need to handle. Sometimes it is not possible to flatten the data, but still we want to avoid cross and false matches. For example, let's assume that we have articles and comments to these articles, for example, news sites or blogs. Imagine that we want to search for articles and comments at the same time. To do this, we will use the Solr nested documents; this recipe will show you how to do this.

How to do it...

To handle hierarchies with nested documents, follow these steps:

We start by defining the index structure. To do this, we add the following fields to our schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="content" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored...

Sorting data on the basis of a function value

Suppose we have a search application that stores information about companies. Every company is described by a name and two floating point numbers that represent the geographical location of the company. One day your boss comes to your room and says that he wants the search results to be sorted by distance from the user's location. What's more, he wants us to force our search engine to return the distance from a user location to each of the returned companies. This recipe will show you how to achieve this requirement.

How to do it...

Let's perform the following steps to sort data on the basis of a function value:

For this recipe, we will begin with the following index structure (add the following entries to your schema.xmlfile):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="location" type="location" indexed="true" stored...

Controlling the number of terms needed to match

Imagine a situation where you have an e-commerce bookstore and you want to make a search algorithm that tries to bring the best search results to your customers. However, you notice that many of your customers tend to make queries with too many words, which results in an empty result list. So, you decide to make a query that will require a maximum of two of the words, which the user entered, to be matched. This recipe will show you how to do it.

Getting ready

Before we continue, it is crucial to mention that the following method can only be used with the dismax or edismax query parser. For the list of available query parsers, refer to http://wiki.apache.org/solr/QueryParser.

How to do it...

Follow these steps to control the number of terms needed to match:

Let's begin with creating our index structure. For our simple use case, we will only have documents with the identifier (the id field) and title (the title field). We define the index structure...

Affecting document score using function queries

There are many situations where you would like to have an influence on how the score of the documents is calculated. For example, you would like to boost the documents on the basis of the purchases of it. As in, as an e-commerce bookstore, you would like to be showed relevant results, but you would also like to influence them by adding yet another factor to their score. Is this possible? Yes, and this recipe will show you how to do it.

How to do it...

Let's see how the document score is affected using function queries and the following steps:

Let's start by defining the index structure by adding the following section to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true" />
<field name="sold" type="int" indexed="true" stored="true" />

The second step will be the example data, which looks like this:
```
<add>
 <...
```

Using simple nested queries

Imagine a situation where you need a query nested inside another query. For example, you want to run a query using the standard request handler, but you need to embed a query that is parsed by the dismax query parser inside it. For example, we will like to find all the books having a certain phrase in their title, and boost the ones that have a part of the phrase present. This recipe will show you how to do this.

How to do it...

Let's start with a simple index that has the following structure:

You need to put the following section to the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true" />

The next step is data indexing. Our example data looks as follows:

<add>
 <doc>
  <field name="id">1</field>
  <field name="title">Revised solrcookbook</field>
 </doc>
 <doc>
  <field name="id">2</field...

Using the Solr document query join functionality

When using Solr, you will probably be used to having a flat structure of documents without any relationships. However, there are situations where decomposing relationships is a cost we can't bear. Due to this, Solr 4.0 comes with a join functionality that lets us use some basic relationships. For example, imagine that our index consists of books and workbooks, and we want to use this relationship. This recipe will show you how to do this.

How to do it...

Let's perform the following steps:

First of all, let's assume that we have the following index structure (just place the following entries in your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="book" type="string" indexed="true" stored="true...

Handling typos with n-grams

Sometimes, there are situations where you would like to have some kind of functionality that allows you to give your user the search results even though he made a typo, perhaps even more than one typo. In Solr, there are multiple ways to do this—use the Spellchecker component and try to correct the user's mistake, use fuzzy queries, or use the n-gram approach. This recipe will concentrate on the third approach and show you how to use n-grams to handle user typos.

How to do it...

For this recipe, let's assume that our index is built of four fields: identifier, name, description, and description_ngram, which will be processed with the n-gram filter.

So, let's start with the definition of our index structure that can look like this (we will place the following entries in the schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type="text_general" indexed="true" stored="true"/...

Rescoring query results

Imagine a situation in which your score calculation is affected by numerous function queries, which makes the score calculation very CPU-intensive. This is not a problem for small result sets, but it is for larger ones. Starting from Solr 4.9, this great search engine gives us the possibility of rerank results. This means that Solr will get some results from our initial query and will apply another query only on those results. The query that is applied modifies the score of the documents. This recipe will show you how this can be done.

How to do it...

Let's say that we have a use case where we want to show the latest books added to our index and boost them on the basis of some additional query. To do this, we will need to take the following steps:

Let's start with a simple index structure. Our index will be built of three fields that look as follows (please put the following entries to the schema.xml file):
```
<field name="id" type="string" indexed="true" stored="true...
```

The rest of the chapter is locked

You have been reading a chapter from

Solr Cookbook - Third Edition

Published in: Jan 2015Publisher: ISBN-13: 9781783553150

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages