Reader small image

You're reading from  Apache Solr High Performance

Product typeBook
Published inMar 2014
Reading LevelIntermediate
Publisher
ISBN-139781782164821
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Surendra Mohan
Surendra Mohan
author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan

Right arrow

Chapter 4. Additional Performance Optimization Techniques

In the previous chapter, we learned different ways to optimize our Solr's performance, starting from understanding the required factors affecting performance, leading on to advanced concepts such as index replication using the master-slave architecture, Solr caching, SolrCloud, and how to scale your Solr instance horizontally. This means that we learned by playing around with multiple Solr servers and sharding, distributed search, and many more.

In this chapter, we will learn how to optimize performance for a few more activities that are rarely used, such as searching for documents that are similar to the ones returned in the search's result set, sorting results based on a function value (the geospatial search), searching for words that sound alike (that is, searching for homophones), and restricting a word or a list of predefined words (say for example, offensive words) from getting displayed to the end user in the search results...

Documents similar to those returned in the search result


Imagine a situation where you need to search for documents that are similar to those you have searched before using some keywords and have been rendered by Solr as a search result. We will continue with our music composition e-commerce portal that we have been using for demonstration purposes. In this section, we will understand how to get similar documents (in our case, music composition) in the search result along with the result set rendered by the user while searching for a keyword.

Let us start by adding the following index structure to the fields section of our schema.xml file:

<field name="wm_id" type="string" indexed="true" stored="true" required="true" />
<field name="wm_name" type="text" indexed="true" stored="true" termVectors="true" />

We will use the following example data to work with:

<add>
  <doc>
    <field name="wm_id">wm1</field>
    <field name="wm_name">Sonata solo flute<...

Sorting results by function values


Consider a situation where you have an application that stores the list of publishing houses in the index and allows users to search it. Added to the situation, you are more concerned about the publishers that are located near the point where you reside and where you are currently searching the information. In this case, you need some feature that you need to sort your search result based on the distance from a geographical point. Can Solr help you achieve this? The answer is yes, and we will demonstrate how we can achieve it in this section.

This section uses geospatial search. Thus, if you are not familiar with geospatial search, we recommend that you refer to the Geospatial Search section covered in Chapter 1, Searching Data, Administrating Solr, Packt Publishing.

Let us now start with the actual activity by adding the following index structure in the fields section of our schema.xml file:

<field name="p_id" type="string" indexed="true" stored="true...

Searching for homophones


You might encounter end users whose English is not that good, so they type the search keywords either as they sound or the way they are pronounced. For instance, words such as break and brake, meat and meet, tale and tail, and phone and fone sound the same when pronounced. There might be situations where the end user might intend to search for phone, and due to certain reasons, they type fone. In such a scenario, by default, Solr considers fone (the word actually typed by the user) instead of phone (what the user actually meant), and the relevant documents are prone to be missed in the rendered result set. To avoid missing the relevant documents in the search results, we need to handle this in such a way that our Solr should be capable of rendering the results for the keywords that sound similar to the typed ones. Can such scenarios be handled by our Solr? The answer is yes; we can make our Solr capable of performing well, and we will learn how to do it in this section...

Ignore the defined words from being searched


Imagine a situation where you wish to filter out offensive words from the indexed data. Such words need to be ignored and shouldn't be searchable. Can we provide such a capability to Solr? Yes, of course; we can do that and we will understand how to do it in this section.

In order to avoid using offensive words in the demonstration, we will use the term offensive, which denotes any offensive word we would like to filter out from being searched.

In order to start, we will define the following index structure in the fields section of our schema.xml file:

<field name="o_id" type="string" indexed="true" stored="true" required="true" />
<field name="o_name" type="text_offensive" indexed="true" stored="true" />

Now, let us define the text_offensive field type in the types section of our schema.xml file as follows:

<fieldType name="text_offensive" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class...

Summary


In this chapter, we covered rarely used but important techniques to optimize the performance of our Solr instance, learned more about how to get similar documents based on the rendered result set, what is the geospatial search (search documents with respect to a specific geographical point), how to search for words based on how they sound, and how to ignore the predefined words from getting searched.

In the next chapter, we will learn how to troubleshoot common problems that are not limited to dealing with corrupted and locked indexes, how to truncate the index size and tackle issues caused due to expensive garbage collections, out-of-memory, and infinite loop execution while playing around with shards.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr High Performance
Published in: Mar 2014Publisher: ISBN-13: 9781782164821
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan