Packt+ | Advance your knowledge in tech

You're reading from Apache Solr High Performance

Product typeBook

Published inMar 2014

Reading LevelIntermediate

Publisher

ISBN-139781782164821

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

High Performance Programming

Author (1)

Surendra Mohan

Chapter 5. Troubleshooting

You must have faced a number of problems while playing around with Solr's deployment, irrespective of whether the deployment is simple or complex, or whether you are working on a single Solr instance or multiple servers or shards.

In this chapter, we will learn how to troubleshoot a list of the most common problems you are prone to facing while you are still in the Solr playground, and will cover the following topics:

Dealing with the corrupt index
Reducing the file count in the index
Dealing with the locked index
Truncating the index size
Dealing with a huge count of open files
Dealing with out-of-memory issues
Dealing with an infinite loop exception in shards
Dealing with expensive garbage collection
Bulk updating a single field without full indexation

So, let us get started.

Dealing with the corrupt index

Assume that you are maintaining a Solr instance, and suddenly, probably at late midnight, you are informed that the index is corrupted and you need to investigate and fix the issue at the earliest. Imagine how frustrating it is to address such priority issues, that too at midnight! You might be wondering whether there is an alternative to full indexation or restoring the working index from the backup. Yes, we do have alternatives to full indexation and/or restoring to the backup that won't consume excess time compared to the preceding options, and we will learn how to do it in this section.

Assuming that we have a corrupt index that we need to investigate and fix, we will have to switch the working directory to the one holding Lucene libraries in order to use the CheckIndex tool. On switching to the appropriate directory, run the following command:

java –cp JAR_PATH_LUCENE -ea:org.apache.lucene...org.apache.lucene.index.CheckIndex PATH_INDEX –fix

In the preceding...

Reducing the file count in the index

Consider a situation where you have a Solr instance running for a long duration and the index is split into multiple files (which is quite natural and expected). Did you imagine how time-consuming it is for Solr to keep connecting all the files of an index to fetch the desired result set, resulting in a performance drop? Don't get hassled; we can figure this out and we will learn how to overcome the issue in this section.

Since the root cause behind this performance drop is the segment's file count (which is huge) that is associated to an index, the solution we can think of is to find a way to merge these split off segment files into one. To do so, we run the optimize command as follows:

curl 'http://localhost:8983/solr/update' --data-binary '<optimize/>' -H 'Content-type:text/xml; charset=utf-8'

After a couple of minutes or probably hours (this primarily depends on the index size), you will get the following response:

<?xml version="1.0" encoding...

Dealing with the locked index

Imagine a situation where, while the indexing process was active, something went wrong, probably your machine crashed or a problem occurred in your virtual machine, resulting in index locking. Let me remind you that when indexing is in progress, it locks the current file in the index directory. When this process aborts abruptly due to certain reasons, the file that is already locked remains as it is, restricting the modification of the index. Our motive would be to sort out this locking issue, and we will learn how to do it in this section.

Let us assume that during the commit operation, our Java Virtual Machine crashed and an intern killed our Solr master instance while the indexing was in progress. The Jetty servlet container normally throws an exception, which looks as follows:

SEVERE: Exception during commit/optimize:java.io.IOException: Lock obtain timed out: SimpleFSLock@/usr/share/solr/data/index/luceneff1fe872c2cbfeb44091b36c21a97c14-write.lock

If you...

Truncating the index size

You might come across situations where you need to truncate the index size to such an extent that it fits into your system's RAM. We will learn how to truncate the index size to a desirable level in this section.

Let us consider our music composition eStore for the demonstration purposes. Assuming that we have four fields that describe the document, we will add the following index structure to the fields section of our schema.xml file:

<field name="wm_id" type="string" indexed="true" stored="true" required="true" />
<field name="wm_name" type="text" indexed="true" stored="true" />
<field name="wm_details" type="text" indexed="true" stored="true" />
<field name="wm_price" type="string" indexed="true" stored="true" />

We will also assume the following points:

Search is to be carried out in the wm_name and wm_details fields
We show the wm_id and wm_price fields
We restrict Solr from using spellchecker and highlighting

We indexed 2,000,000 example documents...

Dealing with a huge count of open files

In this section, we will learn how to get rid of exceptions thrown due to a huge number of files that are open. Before you get into this section, it is recommended that you refer to one of the preceding sections called Reducing the file count in the index.

For the purpose of demonstration, let us assume that Solr (running on a Unix environment) throws the exception whose header looks as follows:
```
java.io.FileNotFoundException: /use/share/solr/data/index/_8.tii
```
This shows that there are too many open files.
We will increase the opened files' limit from 1000 (this was earlier set in my case, and is prone to differ) to 3000. To do so, we will use the ulimit command-line utility as follows:
```
ulimit –n 3000
```
Stopping at this stage would just prove to be a workaround. The primary cause behind this exception is the huge number of segment files that constitute an index. So, the immediate activity proceeding with the ulimit utility should be to optimize the index...

Dealing with out-of-memory issues

You might be aware that every application written in Java is well known for out-of-memory problems. Before we learn how to deal with the out-of-memory problems, let us define out-of-memory in Java terms and briefly understand why such problems occur. It is defined as the state of a Java machine where no additional memory can be allocated to run a process that is in progress. This results in the denial of transferring additional data into the memory, which is essential to run a process appropriately, thereby leading to a cease of the process. We recommend that you refer to the out-of-memory Wiki page at http://en.wikipedia.org/wiki/Out_of_memory if you want to know more about it.

As far as Solr is concerned, these problems are usually associated with a low heap size. We will learn how to avoid and resolve such problems in this section.

You might come across an exception that looks similar to the following one:

SEVERE: java.lang.OutOfMemoryError: Java heap space...

Dealing with an infinite loop exception in shards

As you might be aware that while working with shards, we need to add the IP address of the shards to every query we shoot. To avoid including the IP address of the shards on every query, something might come to mind and you suddenly think of writing them to solrconfig.xml and leave the task of adding the shards' addresses to Solr. So, you added them to the default request handler of your solrconfig.xml file and executed your example query that landed in an infinite loop exception. You might be wondering how to prevent such exceptions from occurring, despite adding the shard addresses to the handler. In this section, we will learn how to overcome infinite loop exception in shards.

We define the following request handler in our solrconfig.xml file, assuming that the IP address of the Solr server we are going to query is 192.168.0.100:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">...

Dealing with expensive garbage collection

You might encounter situations where you have a number of applications running in the Java Virtual Machine and the garbage collection process takes too long to run. Even though this issue occurs, probably you might not be aware of what exactly is happening. In this section, we will learn how to deal with such garbage collections that take too long to execute.

We start by running the following command:

java –Xmx2048M –Xms512m –jar start.jar

After a certain time period, we noticed that Solr starts to hang frequently for a shorter time period and doesn't even respond during this time span, and it is the same with Jetty. This abnormal behavior of responding and not responding is an indication that our garbage collection is taking too long to execute. How are we going to overcome this issue?

Let us modify our Solr start command and see what happens. Now our command looks as follows:

java –Xmx2048M –Xms512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads...

Bulk updating a single field without full indexation

You might be aware that if you wish to update a field in a document that is written in the index, in the standard manner, Solr won't allow you. Instead, you need to remove the complete document from the index and add a new version to it. For a smaller index, the standard approach is quite fine. But think of a situation where you have a huge index and you need to update a field that tracks the visitor count hitting the product.

As a standard approach, it is as good as a full indexation of all the documents (probably millions of documents on a daily basis). Do you think full indexation in such a scenario is an optimal approach? Of course not, due to the fact that it is going to utilize ample resources and is better to be avoided. So, how does one handle such a situation? Don't worry! In this section, we will learn how to update a single field in a document without any need for an expensive, complete indexation.

We will refer to our music...

Summary

In this chapter, we learned how to troubleshoot common problems and also covered how to deal with corrupted and locked indexes, reducing the number of files in the index, and how to truncate the index size. We also learned how to tackle issues caused due to expensive garbage collections, out-of-memory, too many opened files, infinite loop execution while playing around with shards, and how to update a single field in all documents without full indexation activity.

In the next chapter, we will learn how to use ZooKeeper for performance-optimization purposes and will cover how to set up, configure, and deploy ZooKeeper. We will also understand the different applications of ZooKeeper that can help us optimize our Solr's performance.

The rest of the chapter is locked

You have been reading a chapter from

Apache Solr High Performance

Published in: Mar 2014Publisher: ISBN-13: 9781782164821

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5