Packt+ | Advance your knowledge in tech

You're reading from Apache Solr High Performance

Product typeBook

Published inMar 2014

Reading LevelIntermediate

Publisher

ISBN-139781782164821

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

High Performance Programming

Author (1)

Surendra Mohan

Chapter 4. Additional Performance Optimization Techniques

In the previous chapter, we learned different ways to optimize our Solr's performance, starting from understanding the required factors affecting performance, leading on to advanced concepts such as index replication using the master-slave architecture, Solr caching, SolrCloud, and how to scale your Solr instance horizontally. This means that we learned by playing around with multiple Solr servers and sharding, distributed search, and many more.

In this chapter, we will learn how to optimize performance for a few more activities that are rarely used, such as searching for documents that are similar to the ones returned in the search's result set, sorting results based on a function value (the geospatial search), searching for words that sound alike (that is, searching for homophones), and restricting a word or a list of predefined words (say for example, offensive words) from getting displayed to the end user in the search results...

Documents similar to those returned in the search result

Imagine a situation where you need to search for documents that are similar to those you have searched before using some keywords and have been rendered by Solr as a search result. We will continue with our music composition e-commerce portal that we have been using for demonstration purposes. In this section, we will understand how to get similar documents (in our case, music composition) in the search result along with the result set rendered by the user while searching for a keyword.

Let us start by adding the following index structure to the fields section of our schema.xml file:

<field name="wm_id" type="string" indexed="true" stored="true" required="true" />
<field name="wm_name" type="text" indexed="true" stored="true" termVectors="true" />

We will use the following example data to work with:

<add>
  <doc>
    <field name="wm_id">wm1</field>
    <field name="wm_name">Sonata solo flute<...

Sorting results by function values

Consider a situation where you have an application that stores the list of publishing houses in the index and allows users to search it. Added to the situation, you are more concerned about the publishers that are located near the point where you reside and where you are currently searching the information. In this case, you need some feature that you need to sort your search result based on the distance from a geographical point. Can Solr help you achieve this? The answer is yes, and we will demonstrate how we can achieve it in this section.

This section uses geospatial search. Thus, if you are not familiar with geospatial search, we recommend that you refer to the Geospatial Search section covered in Chapter 1, Searching Data, Administrating Solr, Packt Publishing.

Let us now start with the actual activity by adding the following index structure in the fields section of our schema.xml file:

<field name="p_id" type="string" indexed="true" stored="true...

Searching for homophones

You might encounter end users whose English is not that good, so they type the search keywords either as they sound or the way they are pronounced. For instance, words such as break and brake, meat and meet, tale and tail, and phone and fone sound the same when pronounced. There might be situations where the end user might intend to search for phone, and due to certain reasons, they type fone. In such a scenario, by default, Solr considers fone (the word actually typed by the user) instead of phone (what the user actually meant), and the relevant documents are prone to be missed in the rendered result set. To avoid missing the relevant documents in the search results, we need to handle this in such a way that our Solr should be capable of rendering the results for the keywords that sound similar to the typed ones. Can such scenarios be handled by our Solr? The answer is yes; we can make our Solr capable of performing well, and we will learn how to do it in this section...

Ignore the defined words from being searched

Imagine a situation where you wish to filter out offensive words from the indexed data. Such words need to be ignored and shouldn't be searchable. Can we provide such a capability to Solr? Yes, of course; we can do that and we will understand how to do it in this section.

In order to avoid using offensive words in the demonstration, we will use the term offensive, which denotes any offensive word we would like to filter out from being searched.

In order to start, we will define the following index structure in the fields section of our schema.xml file:

<field name="o_id" type="string" indexed="true" stored="true" required="true" />
<field name="o_name" type="text_offensive" indexed="true" stored="true" />

Now, let us define the text_offensive field type in the types section of our schema.xml file as follows:

<fieldType name="text_offensive" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class...

Summary

In this chapter, we covered rarely used but important techniques to optimize the performance of our Solr instance, learned more about how to get similar documents based on the rendered result set, what is the geospatial search (search documents with respect to a specific geographical point), how to search for words based on how they sound, and how to ignore the predefined words from getting searched.

In the next chapter, we will learn how to troubleshoot common problems that are not limited to dealing with corrupted and locked indexes, how to truncate the index size and tackle issues caused due to expensive garbage collections, out-of-memory, and infinite loop execution while playing around with shards.

The rest of the chapter is locked

You have been reading a chapter from

Apache Solr High Performance

Published in: Mar 2014Publisher: ISBN-13: 9781782164821

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5