Packt+ | Advance your knowledge in tech

You're reading from Apache Solr Search Patterns

Product typeBook

Published inApr 2015

Reading LevelIntermediate

Publisher

ISBN-139781783981847

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Author (1)

Jayant Kumar

Chapter 5. Solr in E-commerce

In this chapter, we will discuss in depth the problems faced during the implementation of Solr for search on an e-commerce website. We will look at the related problems and solutions and areas where optimizations may be necessary. We will also look at semantic search and how it can be implemented in an e-commerce scenario. The topics that will be covered in this chapter are listed as follows:

Designing an e-commerce search
Handling unclean data
Handling variations (such as size and color) in the product
Sorting
Problems and solutions of flash sale searches
Faceting with the option of multi-select
Faceting with hierarchical taxonomy
Faceting with size
Implementing semantic search
Optimizations that we can look into

Designing an e-commerce search

E-commerce search is special. For us, a Lucene search is a Boolean information retrieval model based on the vector space model. However, for an end user, or a customer, any search on an e-commerce website is supposed to be simple. A customer would not make a field-specific search but will focus on what he or she wants from the search.

Suppose a customer is looking out for a pink sweater. The search that will be conducted on the e-commerce website will be pink sweater instead of +color:pink +type:sweater—using the Solr query syntax. It is our search that will have to figure out how to provide results to the customer so that whatever is being searched for is available to the customer. The problem with e-commerce website searches is that most of the searches happen with the idea that the results are to be retrieved from bag of words or plain text documents. However, it is important for us to categorize and rank the results so that whatever is being searched for...

Handling unclean data

What do we mean by unclean data? In the last section, we discussed a customer searching for pink sweater, where pink is the color and sweater is the type of clothing. However, the system or the search engine cannot interpret the input in this fashion. Therefore, in our e-commerce schema design earlier, we created a query that searched across all fields available in the index. We then created a separate copyField class to handle search across fields, such as clothes_color, that are not being searched in the default query.

Now, will our query give good results? What if there is a brand named pink? Then what would the results be like? First of all, we would not be sure whether pink is intended to be the color or the brand. Suppose we say that pink is intended to be the color, but we are also searching across brands and it will contain pink as the brand name. The results will be a mix of both clothes_color and brand. In our query, we are boosting brand, so what happens is...

Handling variations in the product

Now that we have somewhat better search results for our e-commerce site, let us look at handling variations. What do we mean by variations? Let us take our earlier example of tommy hilfiger green sweater. For the sake of simplicity, let's say that it comes in three sizes—small, medium, and large. Do we intend to show all three sizes in our search results as individual products? That would be a waste of the display area. If we take the example of a mobile screen, even if our top result is exactly the green sweater we are looking at, in this scenario, it will have three products on the first screen. Instead, we could have shown some other results that may have been of interest to our customer.

Let us push in the sample data for clothes with the schema given in this chapter. Replace the schema.xml file in the default Solr installation with that shared in this chapter and run the following command to push the data_clothes.csv file into the Solr index:

java ...

Sorting

In addition to search, we also need to provide sorting options on an e-commerce website. By default, the search results are ordered by the relevancy score that has been computed on the basis of the boosting we have provided. However, there would still be requirements to sort the search results by other factors such as price, discount, or latest products. Sorting by already known fields is simple. All we need to do is add the sorting criteria behind our Solr search query:

sort=price asc

Alternatively, add the following sorting code:

sort=price desc

The intelligence that needs to be built into the system here is sorting by relevance after price. This is to take care of scenarios where the ordered results may contain irrelevant results in the top when, say, sorted by price in the ascending order. Therefore, we would be modifying our query to include the score while sorting:

sort=price asc,score desc

Now, the top results would be better. Another intelligence that needs to be taken care of...

Problems and solutions of flash sale searches

The major problem that flash sale sites face is the sudden and large amount of traffic. Generally, people are notified in advance about the time of the sale, so at that exact moment, a large number of customers hit the site to purchase the objects on sale. Therefore, we see a sudden spike in traffic and low traffic when there is no flash sale happening.

Another problem is that, as soon as a product is sold out, it should be moved to the bottom of the search result. We have already seen how this situation can be handled in the previous section. However, this requires very frequent updates to the Solr index. Ideally, as soon as a sale happens, the inventory status should be updated in the index. This is a general problem, but with flash sale sites, the problem becomes more acute. This is because at the time when the sale opens, there is a rush for a certain product. Moreover, the site can lose customers if inventory is not properly tracked and reported...

Faceting with the option of multi-select

Facets are extracted from the search result. Once a customer selects an option from the facet, we create a filter query that is typically an AND logic. For example, on searching for a particular item, say tommy hilfiger, we would be getting results that will have facets for size and color. It was previously assumed that the customer would select a single option from both the facets. Say the selections are medium for size and green for color. The filter query would be:

fq=clothes_size:medium&fq=clothes_color:green

This will be appended to our search query:

q=tommy%20hilfiger&qf=text%20cat^2%20name^2%20brand^2%20clothes_type^2%20clothes_color^2%20clothes_occassion^2&pf=text%20cat^3%20name^3%20brand^3%20clothes_type^3%20clothes_color^3%20clothes_occassion^3&fl=name,clothes_size,clothes_color,score&defType=edismax&facet=true&facet.mincount=1&facet.field=clothes_gender&facet.field=clothes_type&facet.field=clothes_size...

Faceting with hierarchical taxonomy

You will have come across e-commerce sites that show facets in a hierarchy. Let's take a look at www.amazon.com and check how hierarchy is handled there. A search for "shoes" provides the following hierarchy:

Department Shoes -> Men -> Outdoor -> Hiking & Trekking -> Hiking Boots

Hierarchical facets on www.amazon.com

How is this hierarchy built into Solr and how do searches happen on it?

In earlier versions of Solr, this used to be handled by a tokenizer known as solr.PathHierarchyTokenizerFactory. Each document would contain the complete path or hierarchy leading to the document, and searches would show multiple facets for a single document.

For example, the shoes hierarchy we saw earlier can be indexed as:

doc #1 : /dept_shoes/men/outdoor/hiking_trekking/hiking_boots
doc #2 : /dept_shoes/men/work/formals/

The PathHierarchyTokenizerFactory class will break this field, say, into the following tokens:

doc #1 : /dept_shoes, /dept_shoes/men, /dept_shoes...

Faceting with size

The problem with faceting with size is that the ordering by size is not directly visible. Let us take the following sizes:

XS, S, M, L, XL, XXL

These sizes would be listed in the alphabetical order as follows:

M, L, S, XL, XS, XXL

To handle such ordering scenarios in size for different apparel, we could encode a size tag into the size facet label. Therefore, the size ordering would be somewhat as follows:

[00002]XS
[00003]S
[00004]M
[00005]L
[00006]XL
[00007]XXL

This will ensure that the facets we get from Solr are ordered in the way we want them to be ordered.

Implementing semantic search

Semantic search is when the search engine understands what the customer is searching for and provides results that are based on this understanding. Therefore, a search for the term shoes should display only items that are of type shoes instead of items with a description goes well with black shoes. We could argue that since we are boosting on the fields category, type, brand, color, and size, our results should match with what the customer is looking or searching for. However, this might not be the case. Let us take a more appropriate example to understand this situation.

Suppose a customer is searching for blue jeans where blue is intended to be the color and jeans is the type of apparel. What if there is a brand of products called blue jeans? The results coming from the search would not be as expected by the customer. As all the fields are being boosted by the same boost factor, the results will be a mix of the intended blue colored jeans and the products from...

Optimizations

Performing two searches in Solr for every search on the website would not be optimal. However, we need to identify the fields before performing the search. Another easier way to do this is to incorporate the dictionary in the product catalog index itself.

For this, we will have to create fields in our index matching the dictionary key fields. Then during indexing, we need to populate the key fields with words that match the product. Let us take an example to understand this. In our case, let us say that we are dealing with three fields in our dictionary, clothes_type, clothes_color, and brand. We would create three new fields in our product index, key_clothes_type, key_clothes_color, and key_brand. These fields would contain product-specific information that matches with our dictionary.

For the product, wrangler jeans, the information in these fields would be:

key_clothes_type : jeans
key_clothes_color : blue
key_brand : wrangler

For the next product, skinny fit black jeans, the...

Summary

In this chapter, we studied in depth the implementation of Solr in an e-commerce scenario. We saw how to design an e-commerce index in Solr. We discussed the problems faced in e-commerce while implementing a search. We saw different ways of sorting and faceting the products. We also saw multi-select and hierarchical faceting. We had a look at the concept of semantic search and some optimizations that could be used while implementing the same.

In the next chapter, we will look at best practices and ideas for using Solr for a spatial or geospatial search.

The rest of the chapter is locked

You have been reading a chapter from

Apache Solr Search Patterns

Published in: Apr 2015Publisher: ISBN-13: 9781783981847

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages