Packt+ | Advance your knowledge in tech

You're reading from Mastering Apache Solr 7.x

Product typeBook

Published inFeb 2018

Reading LevelExpert

PublisherPackt

ISBN-139781788837385

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Authors (3):

Sandeep Nair

Chintan Mehta

Dharmesh Vasoya

View More author details

Chapter 7. Advanced Queries – Part II

We started understanding the concept of relevance and its terms precision and recall in the previous chapter. Then we learned about various query parsers, their parameters, and how we can configure them. In the same way, we explored various response writers, their parameters, and how we can configure them. We also looked at velocity search UI. Then we learned about various faceting parameters and faceting types, such as range faceting, pivot faceting, and interval faceting. At the end, we saw the Solr highlighting mechanism, parameters, various highlighters, and boundary scanners.

In this chapter, we will learn about more search functionalities such as spellchecking, suggester, pagination, result grouping and clustering, and spatial search. Let's start with the spellchecking feature of Solr.

Spellchecking

We have seen that Solr provides magical support for searching. Solr provides a strong index building mechanism, unifiable search configurations, and providing interesting and expected formatted results by executing various transformation steps on the query output. Spellchecking is an advantageous feature provided by Solr for those who make mistakes while typing a query or may enter an incorrect or inappropriate input. Sometimes, we have this experience while searching on Google. If we enter sokcer, then Google provides a hint: Did you mean: soccer? Or sometimes, typing socer will directly show results for soccer rather than displaying any hints.

Likewise, there are some scenarios where we need to be careful about the input word:

If a user enters input search terms with incorrect spelling and there is no matching document available, we use the Solr spellcheck feature, displaying a message that searching for soccer instead of socer will give the user a hassle-free experience of...

Suggester

In the preceding section, we have seen how Solr handles incorrectly spelled terms and then returns the correct output for them. Let's move one step ahead and provide a feature wherein the user always enters correct spellings but we want to be a step ahead and provide list of suggestions using whatever the user has already typed. This can be achieved by a Solr tool called suggester. The suggester suggests terms when the user types words. During the implementation of the suggester, we need to consider these two things:

It must be very fast as we need to display suggestions on the user's characters type
The suggestions should be ranked and ordered by term frequency

To configure any suggester, SuggestComponent needs to be configured in solrconfig.xml. Here is a simple configuration of a suggester:

<searchComponent name="suggest" class="solr.SuggestComponent">
 <lst name="suggester">
 <str name="name">mySuggester</str>
 <str name="lookupImpl">FuzzyLookupFactory...

Pagination

A query may fetch a number of results for a search. Returning all results at a time and displaying all of them on a single page is not an ideal approach for any search application. Rather, returning the top N number of matching results (sorted based on some fields) first is the ideal way for an application. Solr supports a pagination feature whereby we can return a certain number of results rather than all results and display them on the first page. If we can't find the results we are looking for on the first page, we can call the next page of results by running the subsequent request with pagination parameters. Pagination is very helpful in terms of performance because instead of returning all matching results at a time, it will return only a specific number of results; so the result is very quick. Using pagination, we also can determine how many queries are required to fulfill the expectations behind the search; so we can manage relevance accordingly.

How to implement pagination...

Result grouping

Result grouping is a useful feature in Solr; it returns an optimal mix of search results for a query. Result grouping can be performed based on field values, functions, or queries.

Sometimes, we have multiple similar documents for a single search term, for example, multiple locations for the same hospital, recipes for specific food, plans for term insurance, and so on. In the normal way, if we are searching for one such term, it will return all similar documents and we will have to display all of them on the same page. Through result grouping, we can display only a single document (or the top few or some limited number) for each unique value, and provide a message link with meaningful text and the number of total results found for that query. Clicking on that link will expand the full search result list. This is similar to the expand and collapse features of a search application. Result grouping is just as capable as expand and collapse; additionally it removes duplicate documents...

Result clustering

So far, we have seen Solr searching by the keyword used in search query. Result clustering is the advanced search component of Solr; it first identifies the similarities between documents, and using these similarities, it finds related documents. It is also not necessary for the identified similarities to be present in the query or document.

The clustering component first discovers the results of a search query and identifies similar terms or phrases found within the search results. A clustering algorithm discovers relationships across all the documents from the search result and forms in a meaningful cluster label. Solr comes with several algorithms for clustering implementation.

Result clustering parameters

Spatial search

Location-based data search is a very important requirement nowadays, such as searching for distances from a place, searching a house within a radius, and so on. Solr supports searching for location-based data called spatial or geospatial searches.

This can be implemented by indexing a field in each document that contains a geographical point (a latitude and longitude); and then at query time, we can find and sort documents by distance from a geolocation (latitude and longitude). The matching listings (results) can be displayed on an interactive map, in which we can zoom in/out and move the map center point to find nearby listings using spatial search. Like latitude and longitude, Solr also allows us to index geographical shapes (polygons), which are used to find a document that intersects geographical regions. Spatial search implemented by indexing a field (latitude and longitude) helps to search from a specific point, while implementing by indexing geographical shapes helps...

Summary

In this chapter, we explored and understood various searching functionalities such as spellchecking, suggester, pagination, result grouping, and result clustering. Finally, we looked at spatial search.

So far, we have seen configurations for each one and executed examples by configuring various functionality parameters. Now let's move to the next chapter, where we will see how to configure Solr for production and learn fine-tuning methodologies for better performance. We will explore how to secure Solr and how to take backups. We will configure logging and get an overview of SolrCloud.

The rest of the chapter is locked

You have been reading a chapter from

Mastering Apache Solr 7.x

Published in: Feb 2018Publisher: PacktISBN-13: 9781788837385

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

Parameter	Behavior	Default value
`clustering`	Enable/disable clustering.	`true`
`clustering.engine`	Specifies which clustering engine to use. If not specified, the first declared engine will become the default one.	first in a list
`clustering.results`	When true, the component will run...