Reader small image

You're reading from  Apache Solr PHP Integration

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782164920
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Chapter 6. Debug and Stats Component

Debug and stats are two components in Solarium used to get more information about the index statistics and how queries are executed and results returned. In this chapter we will explore both the components and go in depth on how to retrieve the index statistics using the stats component. We will also look at how Solr calculates relevance scores and how we can use PHP to get and display the query explanation returned by Solr. We will explore:

  • How Solr does relevance ranking

  • Executing a debug through PHP code

  • Running a debug on Solr interface

  • Displaying the output of debug query

  • Display query result statistics using the stats component

You could say why should I go into the theory about these components? What will this help me achieve? The benefit of using the debug component is to understand and analyze how the search result was ranked. Why did a certain document come on the top and why did another document come at the end? Further if you want to alter the ranking...

Solr relevance ranking


When a query is passed to Solr, it is converted to an appropriate query string that is then executed by Solr. For each document in the result, Solr calculates the relevance score according to which the document is sorted. By default higher scoring documents are given priority in the result.

The Solr relevancy algorithm is known as the tf-idf model where tf stands for term frequency and idf stands for inverse document frequency. The meaning of the parameters used in relevance calculation so we can interpret the output of debug query are explained as follows:

  • tf: The term frequency is the frequency with which a term appears in a document. Higher term frequency results in a high document score.

  • idf: The inverse document frequency is the inverse of the number of documents in which the term appears. It indicates the rarity of the term across all documents in the index. Documents having a rare term are scored higher.

  • coord: It is the coordination factor that says how many...

Executing debug through PHP code


To enable debugging of our Solr query using PHP, we need to get the debug component from our query.

In addition to getting debug information of the default query, we can call the explainOther() function to get a score of certain documents that match the query specified in explainOther() function with respect to the main query as shown in the following query:

  $query->setQuery('cat:book OR author:martin^2');
  $debugq = $query->getDebug();
  $debugq->setExplainOther('author:king');

In the preceding piece of code, we are searching for all books and boosting books by author martin by 2. In addition to this we are getting the debug information for books by author king.

After running the query, we need to get the debug component from the ResultSet. We then use it to get the query string, parsed query string, the query parser and information about the debug other query as shown in the following code:

  echo 'Querystring: ' . $dResultSet->getQueryString...

Running debug on Solr interface


The parameters appended to the Solr query URL in our example are debugQuery=true, explainOther=author:king, and debug.explain.structured=true. Let us check the Solr output for a debug query by visiting the URL http://localhost:8080/solr/collection1/select/?omitHeader=true&debugQuery=true&fl=id,name,author,series_t,score,price&start=0&q=cat:book+OR+author:martin^2&rows=5

The following is a screenshot of the output of the previous query:

We can see the debug component after the results component in Solr query results interface. It contains the raw query and parsed query. The explain element in the debug component contains the score and the calculations that were done to achieve the score

Since debugging a Solr query is required to tune the relevance, it makes more sense to use the Solr interface to see the debug output. PHP interface to the debug component can be used to create an interactive user interface where field level boosts are taken...

The stats component


The stats component can be used to return simple statistics for indexed numeric fields in the document set returned by a Solr query. Let us get the statistics for prices of all books in our index. We will also facet on price and availability (inStock) and see the output.

Tip

It is advisable to use a templating engine instead of writing HTML code inside PHP.

Create the query to fetch all books and set the number of rows to 0 as we are not interested in the results but only the statistics, which will be fetched as a separate component as given in the following query:

  $query->setQuery('cat:book');
  $query->setRows(0);

Get the stats component and create statistics for field price and create facets on price and inStock fields.

  $statsq = $query->getStats();
  $statsq->createField('price')->addFacet('price')->addFacet('inStock');

Execute the query and fetch the stats component from the result-set as given in the following query:

  $resultset = $client->select...

Summary


This chapter gave us some insight into our index and into how results are ranked. We saw the parameters used to calculate the relevance score and how to extract the calculation from Solr using PHP. We discussed the use of the debug query. We saw how to extract statistics of numeric fields for a query from our index and how to display the information using PHP. The information retrieved from these modules is used to analyze and improve the Solr search results. Statistics can also be used for reporting purposes.

In the next chapter we will explore how to build spell suggestions using the Solr and PHP. We will also build an auto complete feature to suggest query options during a search.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr PHP Integration
Published in: Nov 2013Publisher: PacktISBN-13: 9781782164920
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar