Reader small image

You're reading from  Apache Solr PHP Integration

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782164920
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Chapter 3. Select Query on Solr and Query Modes (DisMax/eDisMax)

This chapter will cover how to execute a basic select query on the Solr index using PHP and the Solarium library. We will be specifying different query parameters such as number of rows to fetch, fetching specific fields, sorting, and some other parameters in the Solarium query. We will discuss what query modes (query parsers) in Solr are and will also go through the different query modes available in Solr and their usage. We will look at different features to improve the results from our query or get more specific results from our query. The topics that will be covered are as follows:

  • Creating a basic select query with sorting and return fields

  • Running queries using select configuration

  • Re-using queries

  • DisMax and eDisMax query modes

  • Component-based architecture of Solarium

  • Executing queries using DisMax and eDisMax

  • Date boosting in eDisMax

  • Advanced tuning parameters

Creating a basic select query with sorting and return fields


Using the following query, let us look for all the books in our index and return the top five results in JSON format:

http://localhost:8080/solr/collection1/select/?q=cat:book&rows=5&wt=json

As seen earlier, we can form a query URL and use cURL to fire the query via PHP. Decode the JSON response and use it as result.

Let us look at the Solarium code to execute select queries on Solr. Create a select query from the Solarium client as follows:

$query = $client->createSelect();

Create a query to search for all books:

$query->setQuery('cat:book');

Suppose we show three results per page. So on the second page, we will start from four and display the next three results.

$query->setStart(3)->setRows(3);

Set which fields should be returned using the following code:

$query->setFields(array('id','name','price','author'));

Tip

PHP 5.4 users can use square brackets to construct an array instead of the earlier array(...) construct...

Running a query using select configuration


In addition to building the select query through functions, it is also possible to build a select query using an array of key-value pairs. Here is a selectconfig query with parameters for the preceding query:

$selectConfig = array(
  'query' => 'cat:book AND author:Martin',
  'start' => 3,
  'rows' => 3,
  'fields' => array('id','name','price','author'),
  'sort' => array('price' => 'asc')
);

We can also add multiple sorting fields as an array using the addSorts(array $sorts) function. To sort by price and then by score, we can use the following parameters in the addSorts() function:

$query->addSorts(array('price'=>'asc','score'=>'desc'));

We can use the getQuery() function to get the query parameter. And the getSorts() function to get the sorting parameter from our select query. We can also use the removeField($fieldStr) and removeSort($sortStr) functions to remove parameters from the fields list and sort list of our query...

Re-using queries


In most cases, the queries that you build as a part of the application can be reused. It would make more sense to re-use the queries instead of creating them again. The functions provided by the Solarium interface help in modifying the Solarium query for re-use. Let us see an example for re-using queries.

Suppose we form a complex query based on input parameters. For pagination purposes, we would like to use the same query but change the start and rows parameters to fetch the next or previous page. Another case where a query could be reused is sorting. Suppose we would like to sort by price in ascending order and later by descending order.

Let us first define and create an alias for Solarium namespaces we will be using in our code.

use Solarium\Client;
use Solarium\QueryType\Select\Query\Query as Select;

Next, create a class that extends the Solarium query interface:

Class myQuery extends Select
{

Inside the class we will create the init() function, which will override the same...

DisMax and eDisMax query modes


DisMax (Disjunction Max)and eDisMax (Extended Disjunction Max) are query modes in Solr. They define the way how Solr parses user input to query different fields and with different relevance weights. eDisMax is an improvement over the DisMax query mode. DisMax and eDisMax are by default enabled in our Solr configuration. To switch the query type we need to specify defType=dismax or defType=edismax in our Solr query.

Let us add some more books to our index. Execute the following command in our <solr dir>/example/exampledocs folder (books.csv is available in code downloads):

java -Durl=http://localhost:8080/solr/update -Dtype=application/csv -jar post.jar books.csv

DisMax handles most queries. But there are still some cases where DisMax is unable to provide results. It is advisable to use eDisMax in those cases. The DisMax query parser does not support the default Lucene query syntax. But that syntax is supported in eDisMax. Let us check it out.

To search...

Executing queries using DisMax and eDisMax


Let us explore how to execute DisMax and eDisMax queries using the Solarium library. First, get a DisMax component from our select query using the following code:

$dismax = $query->getDisMax();

Boosting is used in Solr to alter the score of some documents in a resultset, so that certain documents are ranked higher than others based on their content. A boost query is a raw query string that is inserted along with the user's query to boost certain documents in the result. We can set a boost on author = martin. This query will boost results where author contains martin by 2.

$dismax->setBoostQuery('author:martin^2');

Query fields specify the fields to query with certain boosts. The query string passed in setQuery function is matched against text in these fields. When a field is boosted, a match for a query text in that field is given more importance and so that document is ranked higher. In the following function, matches in the author field are...

Date boosting in an eDisMax query


Let us use eDisMax to boost the results of a search based on date so that the most recent book appears on top. We will use the setBoostFunctionsMult() function to specify the boost on modified_date, which in our case stores the date when the record was last added or updated.

$query = $client->createSelect();
$query->setQuery('cat:book -author:martin');
$edismax = $query->getedismax();
$edismax->setBoostFunctionsMult('recip(ms(NOW,last_modified),1,1,1)');
$resultSet = $client->select($query);

Here we are searching for all books where the author is not named Martin (martin). The (negative sign) is meant for not query. And we have added a multiplicative boost on the reciprocal of the date between today and last modified date. The recip function provided by Solr is defined as follows:

recip(x,m,a,b) = a/(m*x+b) which in our case becomes 1/(1*ms(NOW,last_modified)+1)

Here m, a, and b are constants, x can be any numeric value or complex function...

Advanced query parameters


Alternative queries are used when the query parameter is either blank or not specified. Solarium by default sets the query parameter as *:*. Alternative queries can be used to get all documents from an index for faceting purposes.

$dismax->setQueryAlternative('*:*');

For selecting all documents in DisMax/eDisMax, the normal query syntax *:* does not work. To select all documents, set the default query value in Solarium query to empty string. This is required as the default query in Solarium is *:*. Also set the alternative query to *:*. DisMax/eDisMax normal query syntax does not support *:*, but the alternative query syntax does.

Summary


We were able to execute select queries on Solr using the Solarium library. We explored the basic parameters for the select query. We saw how to use a configuration array to create a Solarium query. We were able to iterate through the results after executing a query. We extended the query class to re-use queries. We were able to do pagination on our existing query and were able to change the sorting parameters without recreating the complete query again. We saw DisMax and eDisMax query modes in Solr. We also got an idea of the component based structure of Solarium library. We explored the query parameters for DisMax and eDisMax queries. We also saw how to use an eDisMax query to do "recent first" date boosting on Solr. Finally, we saw some advanced query parameters for DisMax and eDisMax in Solarium.

In the next chapter, we will go deeper into advanced queries based on different criteria from our query result.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr PHP Integration
Published in: Nov 2013Publisher: PacktISBN-13: 9781782164920
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar