Reader small image

You're reading from  Apache Solr High Performance

Product typeBook
Published inMar 2014
Reading LevelIntermediate
Publisher
ISBN-139781782164821
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Surendra Mohan
Surendra Mohan
author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan

Right arrow

Chapter 5. Troubleshooting

You must have faced a number of problems while playing around with Solr's deployment, irrespective of whether the deployment is simple or complex, or whether you are working on a single Solr instance or multiple servers or shards.

In this chapter, we will learn how to troubleshoot a list of the most common problems you are prone to facing while you are still in the Solr playground, and will cover the following topics:

  • Dealing with the corrupt index

  • Reducing the file count in the index

  • Dealing with the locked index

  • Truncating the index size

  • Dealing with a huge count of open files

  • Dealing with out-of-memory issues

  • Dealing with an infinite loop exception in shards

  • Dealing with expensive garbage collection

  • Bulk updating a single field without full indexation

So, let us get started.

Dealing with the corrupt index


Assume that you are maintaining a Solr instance, and suddenly, probably at late midnight, you are informed that the index is corrupted and you need to investigate and fix the issue at the earliest. Imagine how frustrating it is to address such priority issues, that too at midnight! You might be wondering whether there is an alternative to full indexation or restoring the working index from the backup. Yes, we do have alternatives to full indexation and/or restoring to the backup that won't consume excess time compared to the preceding options, and we will learn how to do it in this section.

Assuming that we have a corrupt index that we need to investigate and fix, we will have to switch the working directory to the one holding Lucene libraries in order to use the CheckIndex tool. On switching to the appropriate directory, run the following command:

java –cp JAR_PATH_LUCENE -ea:org.apache.lucene...org.apache.lucene.index.CheckIndex PATH_INDEX –fix

In the preceding...

Reducing the file count in the index


Consider a situation where you have a Solr instance running for a long duration and the index is split into multiple files (which is quite natural and expected). Did you imagine how time-consuming it is for Solr to keep connecting all the files of an index to fetch the desired result set, resulting in a performance drop? Don't get hassled; we can figure this out and we will learn how to overcome the issue in this section.

Since the root cause behind this performance drop is the segment's file count (which is huge) that is associated to an index, the solution we can think of is to find a way to merge these split off segment files into one. To do so, we run the optimize command as follows:

curl 'http://localhost:8983/solr/update' --data-binary '<optimize/>' -H 'Content-type:text/xml; charset=utf-8'

After a couple of minutes or probably hours (this primarily depends on the index size), you will get the following response:

<?xml version="1.0" encoding...

Dealing with the locked index


Imagine a situation where, while the indexing process was active, something went wrong, probably your machine crashed or a problem occurred in your virtual machine, resulting in index locking. Let me remind you that when indexing is in progress, it locks the current file in the index directory. When this process aborts abruptly due to certain reasons, the file that is already locked remains as it is, restricting the modification of the index. Our motive would be to sort out this locking issue, and we will learn how to do it in this section.

Let us assume that during the commit operation, our Java Virtual Machine crashed and an intern killed our Solr master instance while the indexing was in progress. The Jetty servlet container normally throws an exception, which looks as follows:

SEVERE: Exception during commit/optimize:java.io.IOException: Lock obtain timed out: SimpleFSLock@/usr/share/solr/data/index/luceneff1fe872c2cbfeb44091b36c21a97c14-write.lock

If you...

Truncating the index size


You might come across situations where you need to truncate the index size to such an extent that it fits into your system's RAM. We will learn how to truncate the index size to a desirable level in this section.

Let us consider our music composition eStore for the demonstration purposes. Assuming that we have four fields that describe the document, we will add the following index structure to the fields section of our schema.xml file:

<field name="wm_id" type="string" indexed="true" stored="true" required="true" />
<field name="wm_name" type="text" indexed="true" stored="true" />
<field name="wm_details" type="text" indexed="true" stored="true" />
<field name="wm_price" type="string" indexed="true" stored="true" />

We will also assume the following points:

  • Search is to be carried out in the wm_name and wm_details fields

  • We show the wm_id and wm_price fields

  • We restrict Solr from using spellchecker and highlighting

We indexed 2,000,000 example documents...

Dealing with a huge count of open files


In this section, we will learn how to get rid of exceptions thrown due to a huge number of files that are open. Before you get into this section, it is recommended that you refer to one of the preceding sections called Reducing the file count in the index.

  1. For the purpose of demonstration, let us assume that Solr (running on a Unix environment) throws the exception whose header looks as follows:

    java.io.FileNotFoundException: /use/share/solr/data/index/_8.tii
    

    This shows that there are too many open files.

  2. We will increase the opened files' limit from 1000 (this was earlier set in my case, and is prone to differ) to 3000. To do so, we will use the ulimit command-line utility as follows:

    ulimit –n 3000
    
  3. Stopping at this stage would just prove to be a workaround. The primary cause behind this exception is the huge number of segment files that constitute an index. So, the immediate activity proceeding with the ulimit utility should be to optimize the index...

Dealing with out-of-memory issues


You might be aware that every application written in Java is well known for out-of-memory problems. Before we learn how to deal with the out-of-memory problems, let us define out-of-memory in Java terms and briefly understand why such problems occur. It is defined as the state of a Java machine where no additional memory can be allocated to run a process that is in progress. This results in the denial of transferring additional data into the memory, which is essential to run a process appropriately, thereby leading to a cease of the process. We recommend that you refer to the out-of-memory Wiki page at http://en.wikipedia.org/wiki/Out_of_memory if you want to know more about it.

As far as Solr is concerned, these problems are usually associated with a low heap size. We will learn how to avoid and resolve such problems in this section.

You might come across an exception that looks similar to the following one:

SEVERE: java.lang.OutOfMemoryError: Java heap space...

Dealing with an infinite loop exception in shards


As you might be aware that while working with shards, we need to add the IP address of the shards to every query we shoot. To avoid including the IP address of the shards on every query, something might come to mind and you suddenly think of writing them to solrconfig.xml and leave the task of adding the shards' addresses to Solr. So, you added them to the default request handler of your solrconfig.xml file and executed your example query that landed in an infinite loop exception. You might be wondering how to prevent such exceptions from occurring, despite adding the shard addresses to the handler. In this section, we will learn how to overcome infinite loop exception in shards.

We define the following request handler in our solrconfig.xml file, assuming that the IP address of the Solr server we are going to query is 192.168.0.100:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">...

Dealing with expensive garbage collection


You might encounter situations where you have a number of applications running in the Java Virtual Machine and the garbage collection process takes too long to run. Even though this issue occurs, probably you might not be aware of what exactly is happening. In this section, we will learn how to deal with such garbage collections that take too long to execute.

We start by running the following command:

java –Xmx2048M –Xms512m –jar start.jar

After a certain time period, we noticed that Solr starts to hang frequently for a shorter time period and doesn't even respond during this time span, and it is the same with Jetty. This abnormal behavior of responding and not responding is an indication that our garbage collection is taking too long to execute. How are we going to overcome this issue?

Let us modify our Solr start command and see what happens. Now our command looks as follows:

java –Xmx2048M –Xms512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads...

Bulk updating a single field without full indexation


You might be aware that if you wish to update a field in a document that is written in the index, in the standard manner, Solr won't allow you. Instead, you need to remove the complete document from the index and add a new version to it. For a smaller index, the standard approach is quite fine. But think of a situation where you have a huge index and you need to update a field that tracks the visitor count hitting the product.

As a standard approach, it is as good as a full indexation of all the documents (probably millions of documents on a daily basis). Do you think full indexation in such a scenario is an optimal approach? Of course not, due to the fact that it is going to utilize ample resources and is better to be avoided. So, how does one handle such a situation? Don't worry! In this section, we will learn how to update a single field in a document without any need for an expensive, complete indexation.

We will refer to our music...

Summary


In this chapter, we learned how to troubleshoot common problems and also covered how to deal with corrupted and locked indexes, reducing the number of files in the index, and how to truncate the index size. We also learned how to tackle issues caused due to expensive garbage collections, out-of-memory, too many opened files, infinite loop execution while playing around with shards, and how to update a single field in all documents without full indexation activity.

In the next chapter, we will learn how to use ZooKeeper for performance-optimization purposes and will cover how to set up, configure, and deploy ZooKeeper. We will also understand the different applications of ZooKeeper that can help us optimize our Solr's performance.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr High Performance
Published in: Mar 2014Publisher: ISBN-13: 9781782164821
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan