Reader small image

You're reading from  Apache Solr Search Patterns

Product typeBook
Published inApr 2015
Reading LevelIntermediate
Publisher
ISBN-139781783981847
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Chapter 7. Using Solr in an Advertising System

In this chapter, we will discuss in depth the problems faced during the implementation of Solr in an advertising system. An advertising system generates ads related to the content a user is currently viewing on his or her browser. These contextual ads need to be displayed quickly and need to be relevant for the user so that the user is prompted to click on them. We will look at Solr as a platform to provide solutions to the issues related to this aspect. We will delve into performance optimizations and then proceed with making Solr work with Redis. The topics that will be covered in this chapter are:

  • Ad system functionalities

  • Ad distribution system architecture

  • Ad distribution system requirements

  • Performance improvements

  • Merging Solr with Redis

Ad system functionalities


An ad system is based on the concept of provision of contextual ads (or documents in Solr terms) that are related to either the searched keyword or the document being viewed. Ads can also be generated or searched on the basis of the user's profile information and browsing history. The positioning and placement of an ad must match with the space available on the web page. On the basis of these functionalities, advertisements can be broadly divided into the following categories:

  • Ads based on keywords searched—referred to as a listing ad

  • Ads based on the placement and positioning available on the web page

  • Ads based on the user's browsing history and his or her profile—also known as user-targeted ads

To understand how an ad system works, we need to understand the functionalities it provides. On the back end or the admin side, the ad system should provide the following functionalities:

  • The definition of ad placement or the position where the ad would be able to fit on a web...

Architecture of an ad distribution system


Now that we have a brief overview of the functionalities provided by an advertising system, we can look at the architecture of the advertising system and understand where Solr fits in the picture.

The system would receive parameters such as placement of the ad, keywords related to the ad, and the type of ad to be displayed. On the basis of these parameters, the system will identify the ad to be displayed. Most of the data required for ad display is stored as a browser cookie on an end user's system. This cookie can contain tracking and targeting information. This cookie information is sent over to the ad distribution network and is used for identifying the ad to be displayed and also for gathering the tracking and behavioral information.

The ad system generally works on JSON, HTML, and JavaScript frameworks on the frontend. JavaScript is used on the client side and is placed on the web page on which the ad is to be displayed. JavaScript handles all...

Requirements of an ad distribution system


Now that we have studied the system architecture of an ad distribution network and the various components, let us look at the requirements of an ad distribution system from the viewpoint of performance. Of course, performance is of primary importance. We saw that there are multiple ways in which an ad publisher generates revenue from an ad network. CTR is the most preferred way of measuring the performance of an ad and hence that of the ad network.

Note

CTR stands for Click Through Rate. It is defined as the division of the number of clicks made on an advertisement by the total number of times the advertisement was served (impressions).

In order to deliver a good CTR, the ad being displayed needs to be close to the context of the page currently being viewed by the user. In order to derive the context, we need to run a search with the title and metadata on the page and identify the ads related to that page. Let us create a sample Solr schema for an...

Performance improvements


We learnt in the previous section that the ad distribution system needs to be very fast and capable of handling a large number of requests as compared to a website. In addition, the system should be always available, with the least possible downtime (none if possible). The ads have to be relevant so that merchants obtain the desired response. Let us look at a few parameters that will improve Solr's performance by optimally using the inbuilt caching mechanism.

An index searcher, which is used to process and serve search queries, is always associated with a Solr cache. As long as an index searcher is valid, the associated cache also remains valid. When a new index searcher is opened after a commit, the old index searcher keeps on serving requests until the new index searcher is warmed up. Once the new index searcher is ready, it will start serving all the new search requests. The old index searcher will be closed after it has served all the remaining search requests...

Merging Solr with Redis


Solr indexing involves huge costs. Therefore, handling of real-time data is expensive. Every time a new piece of information comes into the system, it has to be indexed to be available for search. Another way of handling this is to break the Solr index into two parts, stable and unstable. The stable part of the index is contained inside Solr, while the unstable part can be handled by a plugin by extracting information from Redis. The unstable part of the index, which is now inside Redis, can handle real-time additions and deletions through an external script, which is reflected in the search results.

Redis is an advanced key value store that can be used to store documents containing keys and values in the memory. It offers advantages over Memcache, as it syncs the data onto disk and provides replication and clustering facilities. In addition to the storage of normal key values, it provides facilities to store data structures such as strings, hashes, lists, sets, and...

Summary


In this chapter, we understood how an advertising network works. We went through the implementation of Solr for a large-scale ad distribution network. We saw the problems plaguing such an implementation in an advertising network and the solutions to these problems. We also saw the architecture of a large-scale Solr system. We saw the cache optimization options in Solr. We also built a plugin that interacts with Redis to aid the real-time update of the status of ads.

In the next chapter, we will explore a framework known as AJAX Solr, which can be used to execute queries on Solr directly from the client browser without the need for any application.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr Search Patterns
Published in: Apr 2015Publisher: ISBN-13: 9781783981847
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar