Reader small image

You're reading from  Mastering Apache Solr 7.x

Product typeBook
Published inFeb 2018
Reading LevelExpert
PublisherPackt
ISBN-139781788837385
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Sandeep Nair
Sandeep Nair
author image
Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

Chintan Mehta
Chintan Mehta
author image
Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

Dharmesh Vasoya
Dharmesh Vasoya
author image
Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya

View More author details
Right arrow

Chapter 6. Advanced Queries – Part I

In the previous chapter, we learned how to build indexes using various methods. In this chapter, we will see how Solr's search works. Solr comes with a large searching kit; by configuring elements from this kit, it provides users with an extensive search experience and returns impressive results with a helpful interface.

Here is a list of search functionalities provided by Solr, that put Solr in the list of desirable search engines:

  • Highlighting
  • Spell checking
  • Reranking
  • Transformation of results
  • Suggested words
  • Pagination on results
  • Expand and collapse
  • Grouping and clustering
  • Spatial search
  • More like this word
  • Autocomplete

We will look at some of these functions in detail later in this chapter, but first let's understand every component that performs an important role during searches and generates impressive results.

Search relevance


Relevance is a measurement of the user's satisfaction with the response to their search query. It completely depends on the context of the search. Sometimes, the same document can be searched by different classes of people for different context. For example, the search query higher tax payer in India can be searched by:

  • An income tax department in the context of their duty
  • Chartered accountants in the context of their professional interest
  • Students in the context of gaining knowledge

The comprehensiveness of any response depends on the context of the search. Sometimes, the context is high, such as searching for legal information; sometimes, it is low, when someone is searching for context such as specific dance steps. So, during Solr configuration, we need to take care of this too.

There are two terms that play an important role in relevance:

  • Precision: Precision is the percentage of documents in the returned results that are relevant.
  • Recall: Recall is the percentage of relevant...

Velocity search UI


Solr provides a user interface through which we can easily understand the Solr search mechanism. Using velocity search UI, we can explore search features such as faceting, highlighting, autocomplete, and geospatial searching. Previously we have seen an example of techproducts; let's browse its products through velocity UI. You can access the UI through http://localhost:8983/solr/techproducts/browse, as shown in the following screenshot:

Solr uses response writer to generate an organized response. Here velocity UI uses velocity response writer. We will explore response writer later in this chapter.

Query parsing and syntax


In this section, we will explore some query parsers, their features, and how to configure them with Solr. Solr supports some query parsers. Here is the list of parsers supported by Solr:

  • Standard query parser
  • DisMax query parser
  • Extended DisMax (eDisMax) query parser

Each parser has its own configuration parameters for clubbing with Solr. However, there are some common parameters required by all parsers. First let's take a look at these common parameters.

Common query parameters

The following are the common query parameters supported by standard query parser, DisMax query parser, and extended DisMax query parser:

Response writer


The user who is searching is mainly interested in the search output/response. Rather than providing output in only a single format, if we allow them to select their choice of output/response format and return a response in that format, it will really make the user happy. The good news is that Solr provides various response writers for the end user's convenience.

Once the user runs a search, along with providing matching results, Solr provides a formatted and well-organized output result that becomes easy and attractive for the end user. Solr handles this through a response writer. Solr supports these response writers:

  • JSON (default)
  • Standard XML
  • XSLT
  • Binary
  • GeoJSON
  • Python
  • PHP
  • PHP serialized
  • Ruby
  • CSV
  • Velocity
  • Smile
  • XLSX

We can select the response writer by providing an appropriate value to the wt parameter. These are the response writer values for wt:​

Parameter

Behavior

Default value

defType

Selects the query parser:

defType=dismax

Lucene (standard query parser)

sort

Sorts the search results in either ascending or descending order. The value can be specified as asc or ASC and desc or DESC. Sorting is supported by numerical or alphabetical content. Solr supports sorting by field clones.

Example:

  • salary asc: Sorts based...

Faceting


Faceting is the mechanism provided by Solr to categorize results in a meaningful arrangement on indexed fields. Using faceting, the end user will be provided with categorized results, along with a matching count for that search. Now the user can explore the search results, drill down to any result, and thus find an exactly matching result in which they are interested.

There are many types of faceting provided by Solr. Here is a list of faceting types that Solr currently supports:

  • Range faceting
  • Pivot (decision tree) faceting
  • Interval faceting

We will explore these later in this chapter. But to configure any faceting in Solr, first we have to configure the related parameters. So let's understand faceting parameters first.

Common parameters

These are the common parameters for all types of faceting:

Response writer

wt parameter value

JSON

json

Standard XML

xml

XSLT

xslt

Binary

javabin

GeoJSON

geojson

Python

python

PHP

php

PHP serialized

phps

Ruby

ruby

CSV...

Parameter

Behavior

Default value

facet

Enable or disable faceting.

false

facet.query

Specifies a faceting query, which overrides Solr's default faceting query and returns a faceting count.

Field-value...

Highlighting


Solr supports a feature called highlighting that helps end users who are running a query to scan results quickly. Providing a matching term in bold and highlighted the format makes it an extremely satisfying experience for the user. With highlighting, the user can quickly determine the terms they are searching for or make a decision that the provided results do not match their expectations, and lets them move to next query.

Solr comes with a great configuration for highlighting. There are many parameters for fragment sizing, formatting, ordering, backup.alternate behavior, and categorization. Fragments or snippets are parts of the response that contain matching terms.

Highlighting parameters

Solr provides a large list for highlighting fragments. The following are the basic parameters required to start highlighting:

Summary


In this chapter, we learned the concept of relevance and its terms: Precision and Recall. Then we looked at the velocity search UI. We saw the common parameters for various query parsers and explored each query parser (standard, DisMax, and eDisMax) in detail. After that, we looked at various response writers in detail: JSON, standard XML, CSV, and velocity response writer.  We also explored Solr term modifiers, wildcard parameters, fuzzy search, proximity search, and range search.

We looked at all Boolean operators. Then we learned about various faceting parameters and faceting types such as range, pivot, and interval faceting. At the end, we saw Solr highlighting mechanisms, parameters, highlighters, and boundary scanners.

In the next chapter, or rather the second part of this chapter, we will learn more search functionalities such as spell checking, suggester, pagination, result grouping and clustering, and spatial search.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Apache Solr 7.x
Published in: Feb 2018Publisher: PacktISBN-13: 9781788837385
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

author image
Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

author image
Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya

Parameter

Behavior

Default value

hl

A Boolean parameter to enable/disable highlighting. hl=true will enable highlighting.

false

hl.method

To specify a method to implement highlighting...