Administrating Solr

Exclusive offer: get 50% off this eBook here
Administrating Solr

Administrating Solr — Save 50%

Master the use of Drupal and associated scripts to administrate, monitor, and optimize Solr with this book and ebook

$20.99    $10.50
by Surendra Mohan | October 2013 | Open Source Web Development

In this article created by Surendra Mohan, the author of Administrating Solr, we will learn how to nest a query within another query, about stats.jsp, how to use ping status, and what are business rules, how and when they prove to be important for us and how to write your custom rule using Drools.

(For more resources related to this topic, see here.)

Query nesting

You might come across situations wherein you need to nest a query within another query in order to search specific keyword or phrase. Let us imagine that you want to run a query using the standard request handler, but you need to embed a query that is parsed by the dismax query parser inside it. Isn't that interesting? We will show you how to do it.

Our example data looks like this:

<add>
<doc>
<field name="id">1</field>
<field name="title">Reviewed solrcook book</field>
</doc>
<doc>
<field name="id">2</field>
<field name="title">Some book reviewed</field>
</doc>
<doc>
<field name="id">3</field>
<field name="title">Another reviewed little book</field>
</doc>
</add>

Here, we are going to use the standard query parser to support lucene query syntax, but we would like to boost phrases using the dismax query parser. At first it seems to be impossible to achieve, but don't worry, we will handle it. Let us suppose that we want to find books having the words reviewed and book in their title field and we would like to boost the reviewed book phrase by 10. Here we go with the query:

http: //localhost:8080/solr/select?q=reviewed+AND+book+AND+_
query_:"{!dismax qf=title pf=title^10 v=$qq}"&qq=reviewed+book

The results of the preceding query should look like:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
<lst name="params">
<str name="fl">*,score</str>
<str name="qq">book reviewed</str>
<str name="q">book AND reviewed AND _query_:"{!dismax qf=title
pf=title^10 v=$qq}"</str>
</lst>
</lst>
<result name="response" numFound="3" start="0" maxScore="0.77966106">
<doc>
<float name="score">0.77966106</float>
<str name="id">2</str>
<str name="title">Some book reviewed</str>
</doc>
<doc>
<float name="score">0.07087828</float>
<str name="id">1</str>
<str name="title">Reviewed solrcook book</str>
</doc>
<doc>
<float name="score">0.07087828</float>
<str name="id">3</str>
<str name="title">Another reviewed little book</str>
</doc>
</result>
</response>

Let us focus on the query. The q parameter is built of two parts connected together with AND operator. The first one reviewed+AND+book is just a usual query with a logical operator AND defined. The second part building the query starts with a strange looking expression, _query_. This expression tells Solr that another query should be made that will affect the results list. We then see the expression stating that Solr should use the dismax query parser (the !dismax part) along with the parameters that will be passed to the parser (qf and pf).

The v parameter is an abbreviation for value and it is used to pass the value of the q parameter (in our case, reviewed+book is being passed to dismax query parser).

And that's it! We land to the search results which we had expected.

Stats.jsp

From the admin interface, when you click on the Statistics link, though you receive a web page of information about the specific index, this information is actually being served to the browser as an XML linked to an embedded XSL stylesheet. This is then transformed into HTML in the browser. This means that if you perform a GET request on stats.jsp, you will be back with XML demonstrated as follows.

curl http://localhost:8080/solr/mbartists/admin/stats.jsp

If you open the downloaded file, you will see all the data as XML. The following code is an extract of the statistics available that stores individual documents and the standard request handler with the metrics you might wish to monitor (highlighted in the following code):

<entry>
<name>documentCache</name>
<class>org.apache.solr.search.LRUCache</class>
<version>1.0</version>
<description>LRU Cache(maxSize=512,
initialSize=512)</description>
<stats>
<stat name="lookups">3251</stat>
<stat name="hits">3101</stat>
<stat name="hitratio">0.95</stat>
<stat name="inserts">160</stat>
<stat name="evictions">0</stat>
<stat name="size">160</stat>
<stat name="warmupTime">0</stat>
<stat name="cumulative_lookups">3251</stat>
<stat name="cumulative_hits">3101</stat>
<stat name="cumulative_hitratio">0.95</stat>
<stat name="cumulative_inserts">150</stat>
<stat name="cumulative_evictions">0</stat>
</stats>
</entry>
<entry>
<name>standard</name>
<class>org.apache.solr.handler.component.SearchHandler</class>
<version>$Revision: 1052938 $</version>
<description>Search using components:
org.apache.solr.handler.component.QueryComponent,
org.apache.solr.handler.component.FacetComponent</description>
<stats>
<stat name="handlerStart">1298759020886</stat>
<stat name="requests">359</stat>
<stat name="errors">0</stat>
<stat name="timeouts">0</stat>
<stat name="totalTime">9122</stat>
<stat name="avgTimePerRequest">25.409472</stat>
<stat name="avgRequestsPerSecond">0.446995</stat>
</stats>
</entry>

The method of integrating with monitoring system various from system to system., as an example you may explore ./examples/8/check_solr.rb for a simple Ruby script that queries the core and check if the average hit ratio and the average time per request are above a defined threshold.

./check_solr.rb -w 13 -c 20 -imtracks
CRITICAL - Average Time per request more than 20 milliseconds old:
39.5

In the previous example, we have defined 20 milliseconds as the threshold and the average time for a request to serve is 39.5 milliseconds (which is far greater than the threshold we had set).

Ping status

It is defined as the outcome from PingRequestHandler, which is primarily used for reporting SolrCore health to a Load Balancer; that is, this handler has been designed to be used as the endpoint for an HTTP Load Balancer to use while checking the "health" or "up status" of a Solr server. In a simpler term, ping status denotes the availability of your Solr server (up-time and downtime) for the defined duration.

Additionally, it should be configured with some defaults indicating a request that should be executed. If the request succeeds, then the PingRequestHandler will respond with a simple OK status. If the request fails, then the PingRequestHandler will respond with the corresponding HTTP error code. Clients (such as load balancers) can be configured to poll the PingRequestHandler monitoring for these types of responses (or for a simple connection failure) to know if there is a problem with the Solr server.

PingRequestHandler can be implemented which looks something like the following:

<requestHandler name="/admin/ping"
class="solr.PingRequestHandler">
<lst name="invariants">
<str name="qt">/search</str><!-- handler to delegate to -->
<str name="q">some test query</str>
</lst>
</requestHandler>

You may try this out even with a more advanced option, which is to configure the handler with a healthcheckFile that can be used to enable/disable the PingRequestHandler. It would look something like the following:

<requestHandler name="/admin/ping"
class="solr.PingRequestHandler">
<!-- relative paths are resolved against the data dir -->
<str name="healthcheckFile">server-enabled.txt</str>
<lst name="invariants">
<str name="qt">/search</str><!-- handler to delegate to -->
<str name="q">some test query</str>
</lst>
</requestHandler>

A couple of points which you should know while selecting the healthcheckFile option are:

  • If the health check file exists, the handler will execute the query and returns status as described previously.
  • If the health check file does not exist, the handler will throw an HTTP error even though the server is working fine and the query would have succeeded.

This health check file feature can be used as a way to indicate to some load balancers that the server should be "removed from rotation" for maintenance, or upgrades, or whatever reason you may wish.

Business rules

You might come across situations wherein your customer who is running an e-store consisting of different types of products such as jewelry, electronic gazettes, automotive products, and so on defines a business need which is flexible enough to cope up with changes in the search results based on the search keyword.

For instance, imagine of a customer's requirement wherein your need to add facets such as Brand, Model, Lens, Zoom, Flash, Dimension, Display, Battery, Price, and so on whenever a user searches for "Camera" keyword. So far the requirement is easy and can be achieved in simpler way. Now let us add some complexity in our requirement wherein facets such as Year, Make, Model, VIN, Mileage, Price, and so on should get automatically added when the user searches for a keyword "Bike". Worried about how to overrule such complex requirement? This is where business rules come into play. There is n-number of rule engines (both proprietary and open source) in market such as Drools, JRules, and so on which can be plugged-in into your Solr.

Drools

Now let us understand how Drools functions. It injects the rules into working memory, and then it evaluates which custom rules should be triggered based on the conditions stated in the working memory. It is based on if-then clauses, which enables the rules coder to define the what condition must be true (using if or when clause), and what action/event should be triggered when the defined condition is met, that is true (using then clause). Drools conditions are nothing but any Java object that the application wishes to inject as input. A business rule is more or less in the following format:

rule "ruleName"
when
// CONDITION
then
//ACTION

We will now show you how to write an example rule in Drools:

rule "WelcomeLucidWorks"
no-loop
when
$respBuilder : ResponseBuilder();
then
$respBuilder.rsp.add("welcome", "lucidworks");
end

In the given code snippet, it checks for ResponseBuilder object (one of the prime objects which help in processing search requests in a SearchComponent) in the working memory and then adds a key-value pair to that ResponseBuilder (in our case, welcome and lucidworks).

Summary

In this article, we saw how to nest a query within another query, learned about stats.jsp, how to use ping status, and what are business rules, how and when they prove to be important for us and how to write your custom rule using Drools.

Resources for Article:


Further resources on this subject:


Administrating Solr Master the use of Drupal and associated scripts to administrate, monitor, and optimize Solr with this book and ebook
Published: October 2013
eBook Price: $20.99
Book Price: $34.99
See more
Select your format and quantity:

About the Author :


Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies such as Drupal and Moodle for more than nine years. He also delivers technical talks at various community events such as Drupal meet-ups and Drupal camps. To know more about him, his write-ups, and technical blogs, and much more, log on to http://www.surendramohan.info/.

He has also authored the book Administrating Solr, Packt Publishing, and has reviewed other technical books such as Drupal 7 Multi Sites Configuration and Drupal Search Engine Optimization, Packt Publishing, and titles on Drupal commerce and ElasticSearch, Drupal-related video tutorials, a title on Opsview, and many more.

Books From Packt


Apache Solr 4 Cookbook
Apache Solr 4 Cookbook

Apache Solr 3.1 Cookbook
Apache Solr 3.1 Cookbook

 Apache Solr 3 Enterprise Search Server
Apache Solr 3 Enterprise Search Server

Instant Apache Solr for Indexing Data How-to [Instant]
Instant Apache Solr for Indexing Data How-to [Instant]

Scaling Big Data with Hadoop and Solr
Scaling Big Data with Hadoop and Solr

Solr 1.4 Enterprise Search Server
Solr 1.4 Enterprise Search Server

 Mastering Apache Cassandra
Mastering Apache Cassandra

Apache Mahout Cookbook
Apache Mahout Cookbook


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
L
L
B
Y
G
W
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software