Reader small image

You're reading from  Apache Solr Search Patterns

Product typeBook
Published inApr 2015
Reading LevelIntermediate
Publisher
ISBN-139781783981847
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Chapter 5. Solr in E-commerce

In this chapter, we will discuss in depth the problems faced during the implementation of Solr for search on an e-commerce website. We will look at the related problems and solutions and areas where optimizations may be necessary. We will also look at semantic search and how it can be implemented in an e-commerce scenario. The topics that will be covered in this chapter are listed as follows:

  • Designing an e-commerce search

  • Handling unclean data

  • Handling variations (such as size and color) in the product

  • Sorting

  • Problems and solutions of flash sale searches

  • Faceting with the option of multi-select

  • Faceting with hierarchical taxonomy

  • Faceting with size

  • Implementing semantic search

  • Optimizations that we can look into

Handling unclean data


What do we mean by unclean data? In the last section, we discussed a customer searching for pink sweater, where pink is the color and sweater is the type of clothing. However, the system or the search engine cannot interpret the input in this fashion. Therefore, in our e-commerce schema design earlier, we created a query that searched across all fields available in the index. We then created a separate copyField class to handle search across fields, such as clothes_color, that are not being searched in the default query.

Now, will our query give good results? What if there is a brand named pink? Then what would the results be like? First of all, we would not be sure whether pink is intended to be the color or the brand. Suppose we say that pink is intended to be the color, but we are also searching across brands and it will contain pink as the brand name. The results will be a mix of both clothes_color and brand. In our query, we are boosting brand, so what happens is...

Handling variations in the product


Now that we have somewhat better search results for our e-commerce site, let us look at handling variations. What do we mean by variations? Let us take our earlier example of tommy hilfiger green sweater. For the sake of simplicity, let's say that it comes in three sizes—small, medium, and large. Do we intend to show all three sizes in our search results as individual products? That would be a waste of the display area. If we take the example of a mobile screen, even if our top result is exactly the green sweater we are looking at, in this scenario, it will have three products on the first screen. Instead, we could have shown some other results that may have been of interest to our customer.

Let us push in the sample data for clothes with the schema given in this chapter. Replace the schema.xml file in the default Solr installation with that shared in this chapter and run the following command to push the data_clothes.csv file into the Solr index:

java ...

Sorting


In addition to search, we also need to provide sorting options on an e-commerce website. By default, the search results are ordered by the relevancy score that has been computed on the basis of the boosting we have provided. However, there would still be requirements to sort the search results by other factors such as price, discount, or latest products. Sorting by already known fields is simple. All we need to do is add the sorting criteria behind our Solr search query:

sort=price asc

Alternatively, add the following sorting code:

sort=price desc

The intelligence that needs to be built into the system here is sorting by relevance after price. This is to take care of scenarios where the ordered results may contain irrelevant results in the top when, say, sorted by price in the ascending order. Therefore, we would be modifying our query to include the score while sorting:

sort=price asc,score desc

Now, the top results would be better. Another intelligence that needs to be taken care of...

Problems and solutions of flash sale searches


The major problem that flash sale sites face is the sudden and large amount of traffic. Generally, people are notified in advance about the time of the sale, so at that exact moment, a large number of customers hit the site to purchase the objects on sale. Therefore, we see a sudden spike in traffic and low traffic when there is no flash sale happening.

Another problem is that, as soon as a product is sold out, it should be moved to the bottom of the search result. We have already seen how this situation can be handled in the previous section. However, this requires very frequent updates to the Solr index. Ideally, as soon as a sale happens, the inventory status should be updated in the index. This is a general problem, but with flash sale sites, the problem becomes more acute. This is because at the time when the sale opens, there is a rush for a certain product. Moreover, the site can lose customers if inventory is not properly tracked and reported...

Faceting with the option of multi-select


Facets are extracted from the search result. Once a customer selects an option from the facet, we create a filter query that is typically an AND logic. For example, on searching for a particular item, say tommy hilfiger, we would be getting results that will have facets for size and color. It was previously assumed that the customer would select a single option from both the facets. Say the selections are medium for size and green for color. The filter query would be:

fq=clothes_size:medium&fq=clothes_color:green

This will be appended to our search query:

q=tommy%20hilfiger&qf=text%20cat^2%20name^2%20brand^2%20clothes_type^2%20clothes_color^2%20clothes_occassion^2&pf=text%20cat^3%20name^3%20brand^3%20clothes_type^3%20clothes_color^3%20clothes_occassion^3&fl=name,clothes_size,clothes_color,score&defType=edismax&facet=true&facet.mincount=1&facet.field=clothes_gender&facet.field=clothes_type&facet.field=clothes_size...

Faceting with hierarchical taxonomy


You will have come across e-commerce sites that show facets in a hierarchy. Let's take a look at www.amazon.com and check how hierarchy is handled there. A search for "shoes" provides the following hierarchy:

Department Shoes -> Men -> Outdoor -> Hiking & Trekking -> Hiking Boots

Hierarchical facets on www.amazon.com

How is this hierarchy built into Solr and how do searches happen on it?

In earlier versions of Solr, this used to be handled by a tokenizer known as solr.PathHierarchyTokenizerFactory. Each document would contain the complete path or hierarchy leading to the document, and searches would show multiple facets for a single document.

For example, the shoes hierarchy we saw earlier can be indexed as:

doc #1 : /dept_shoes/men/outdoor/hiking_trekking/hiking_boots
doc #2 : /dept_shoes/men/work/formals/

The PathHierarchyTokenizerFactory class will break this field, say, into the following tokens:

doc #1 : /dept_shoes, /dept_shoes/men, /dept_shoes...

Faceting with size


The problem with faceting with size is that the ordering by size is not directly visible. Let us take the following sizes:

XS, S, M, L, XL, XXL

These sizes would be listed in the alphabetical order as follows:

M, L, S, XL, XS, XXL

To handle such ordering scenarios in size for different apparel, we could encode a size tag into the size facet label. Therefore, the size ordering would be somewhat as follows:

[00002]XS
[00003]S
[00004]M
[00005]L
[00006]XL
[00007]XXL

This will ensure that the facets we get from Solr are ordered in the way we want them to be ordered.

Optimizations


Performing two searches in Solr for every search on the website would not be optimal. However, we need to identify the fields before performing the search. Another easier way to do this is to incorporate the dictionary in the product catalog index itself.

For this, we will have to create fields in our index matching the dictionary key fields. Then during indexing, we need to populate the key fields with words that match the product. Let us take an example to understand this. In our case, let us say that we are dealing with three fields in our dictionary, clothes_type, clothes_color, and brand. We would create three new fields in our product index, key_clothes_type, key_clothes_color, and key_brand. These fields would contain product-specific information that matches with our dictionary.

For the product, wrangler jeans, the information in these fields would be:

key_clothes_type : jeans
key_clothes_color : blue
key_brand : wrangler

For the next product, skinny fit black jeans, the...

Summary


In this chapter, we studied in depth the implementation of Solr in an e-commerce scenario. We saw how to design an e-commerce index in Solr. We discussed the problems faced in e-commerce while implementing a search. We saw different ways of sorting and faceting the products. We also saw multi-select and hierarchical faceting. We had a look at the concept of semantic search and some optimizations that could be used while implementing the same.

In the next chapter, we will look at best practices and ideas for using Solr for a spatial or geospatial search.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr Search Patterns
Published in: Apr 2015Publisher: ISBN-13: 9781783981847
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar