Reader small image

You're reading from  Apache Solr Search Patterns

Product typeBook
Published inApr 2015
Reading LevelIntermediate
Publisher
ISBN-139781783981847
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Lucene 4 spatial module


Solr 4 contains three field types for spatial search: LatLonType (or its non-geodetic twin PointType), SpatialRecursivePrefixTreeFieldType (RPT for short), and BBoxField (to be introduced in Solr 4.10 onward). LatLonType has been there since Lucene 3. RPT offers more features than LatLonType and offers fast filter performance. LatLonType is more appropriate for efficient distance sorting and boosting. With Solr, we can use both the fields simultaneously—LatLonType for sorting or boosting and RPT for filtering. BBoxField is used for indexing bounding boxes, querying by a box, specifying search predicates such as Intersects, Within, Contains, Disjoint, or Equals, and relevancy sorting or boosting of properties such as overlapRatio.

We have already seen the LatLonType field, which we used to define the location of our store in the earlier examples. Let us explore RPT and have a look at BBoxField.

SpatialRecursivePrefixTreeFieldType

RPT available in Solr 4 is used to implement...

Searching and filtering on a spatial index


Spatial fields in a Solr index can be searched and filtered using the {!geofilt} and {!bbox} query filters. These filters were introduced and are available in Solr 4.2 onward. We saw a working example of geofilt earlier in this chapter. Let us go through some other queries that can be executed on a spatial index in Solr.

The bbox query

The working of a bbox filter is similar to that of geofilt, except that the former uses the bounding box of a calculated circle. The query remains the same, except that we use the {!bbox} filter instead of the {!geofilt} filter. To convert the earlier query to bbox from geofilt, we run the following query:

http://localhost:8983/solr/collection1/select/?q=*:*&fq={!bbox pt=28.643059,77.368885 sfield=store d=10}

The output in our case would remain the same as that in the figure – Stores within 10 km from our location point – shown earlier in this chapter, but the search now includes the grayed-out area, as shown in the...

Distance sort and relevancy boost


During spatial search, it may be required to sort the search results on the basis of their distance from a specific geographical location (the lat-lon coordinate). With Solr 4.0, the spatial queries seen earlier are capable of returning a distance-based score for sorting and boosting.

Let us see an example wherein spatial filtering and sorting are applied and the distance is returned as the score simultaneously. Our query will be:

http://localhost:8983/solr/collection1/select/?fl=*,score&sort=score asc&q={!geofilt score=distance sfield=store pt=28.642815,77.368413 d=20}

The query output from Solr shows four results along with their scores. Our results are sorted in ascending order on score, which represents the distance as per our query. Hence, the results that are closest to our location appear on top.

The execution of the previous query yields the following output:

In order to add user keywords to the previous Solr query, we will have to add an additional...

Advanced concepts


As discussed earlier, RPT is based on a model where the world is divided into grid squares or cells. This is done recursively to get almost any amount of accuracy required. Each cell is indexed in Lucene with a byte string that has the parent cell's byte string as a prefix. Therefore, it is named PrefixTree. The PrefixTreeStrategy class for indexing and search uses a SpatialPrefixTree abstraction that decides how to divide the world into grid squares and what the byte encoding looks like to represent each grid square. It has two implementations, namely geohash and quadtrees. Let us look at both implementations in detail.

Quadtree

A quadtree is a simple and effective spatial indexing technique wherein each of the nodes represents a bounding box covering the parts of the space that has been indexed. Each node is either a leaf-node that contains one or more indexed points with no child, or an internal node with four children, one for each quadrant. The index structure of a quadtree...

Summary


In this chapter, we learnt how Solr can be used for geospatial search. We understood the features provided by the Solr 4 spatial module and saw how indexing and search can be executed with Solr. We discussed different types of geofilters available with Solr and performed sorting and boosting using distance as the relevancy score. We also saw some advanced concepts of geospatial operations such as quadtrees and geohash.

In the next chapter, we will learn about the problems that we normally face during the implementation of Solr in an advertising system. We will also discuss how to debug such a problem along with tweaks to further optimize the instance(s).

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr Search Patterns
Published in: Apr 2015Publisher: ISBN-13: 9781783981847
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar