Tuning Solr JVM and Container

Scaling Apache Solr


July 2014

$10.00

Optimize your searches using high-performance enterprise search repositories with Apache Solr

(For more resources related to this topic, see here.)

Some of these JVMs are commercially optimized for production usage; you may find comparison studies at http://dior.ics.muni.cz/~makub/java/speed.html. Some of the JVM implementations provide server versions, which would be more appropriate than normal ones.

Since Solr runs in JVM, all the standard optimizations for applications are applicable to it. It starts with choosing the right heap size for your JVM. The heap size depends upon the following aspects:

  • Use of facets and sorting options
  • Size of the Solr index
  • Update frequencies on Solr
  • Solr cache

Heap size for JVM can be controlled by the following parameters:

Parameter

Description

-Xms

This is the minimum heap size required during JVM initialization, that is, container

-Xmx

This is the maximum heap size up to which the JVM or J2EE container can consume

Deciding heap size

Heap in JVM contributes as a major factor while optimizing the performance of any system. JVM uses heap to store its objects, as well as its own content. Poor allocation of JVM heap results in Java heap space OutOfMemoryError thrown at runtime crashing the application. When the heap is allocated with less memory, the application takes a longer time to initialize, as well as slowing the execution speed of the Java process during runtime. Similarly, higher heap size may underutilize expensive memory, which otherwise could have been used by the other application.

JVM starts with initial heap size, and as the demand grows, it tries to resize the heap to accommodate new space requirements. If a demand for memory crosses the maximum limit, JVM throws an Out of Memory exception. The objects that expire or are unused, unnecessarily consume memory in JVM. This memory can be taken back by releasing these objects by a process called garbage collection. Although it's tricky to find out whether you should increase or reduce the heap size, there are simple ways that can help you out. In a memory graph, typically, when you start the Solr server and run your first query, the memory usage increases, and based on subsequent queries and memory size, the memory graph may increase or remain constant. When garbage collection is run automatically by the JVM container, it sharply brings down its usage. If it's difficult to trace GC execution from the memory graph, you can run Solr with the following additional parameters:

-Xloggc:<some file> -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails

If you are monitoring the heap usage continuously, you will find a graph that increases and decreases (sawtooth); the increase is due to the querying that is going on consistently demanding more memory by your Solr cache, and decrease is due to GC execution. In a running environment, the average heap size should not grow over time or the number of GC runs should be less than the number of queries executed on Solr. If that's not the case, you will need more memory.

Features such as Solr faceting and sorting requires more memory on top of traditional search. If memory is unavailable, the operating system needs to perform hot swapping with the storage media, thereby increasing the response time; thus, users find huge latency while searching on large indexes. Many of the operating systems allow users to control swapping of programs.

How can we optimize JVM?

Whenever a facet query is run in Solr, memory is used to store each unique element in the index for each field. So, for example, a search over a small set of facet value (an year from 1980 to 2014) will consume less memory than a search with larger set of facet value, such as people's names (can vary from person to person). To reduce the memory usage, you may set the term index divisor to 2 (default is 4) by setting the following in solrconfig.xml:

<indexReaderFactory name="IndexReaderFactory" class="solr.StandardIndexReaderFactory"> <int name="setTermIndexDivisor">2</int> </indexReaderFactory >

From Solr 4.x onwards, the ability to set the min, max (term index divisor) block size ability is not available. This will reduce the memory usage for storing all the terms to half; however, it will double the seek time for terms and will impact a little on your search runtime.

One of the causes of large heap is the size of index, so one solution is to introduce SolrCloud and the distributed large index into multiple shards. This will not reduce your memory requirement, but will spread it across the cluster.

You can look at some of the optimized GC parameters described at http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning page. Similarly, Oracle provides a GC tuning guide for advanced development stages, and it can be seen at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html. Additionally, you can look at the Solr performance problems at http://wiki.apache.org/solr/SolrPerformanceProblems.

Optimizing JVM container

JVM containers allow users to have their requests served in threads. This in turn enables JVM to support concurrent sessions created for different users connecting at the same time. The concurrency can, however, be controlled to reduce the load on the search server.

If you are using Apache Tomcat, you can modify the following entries in server.xml for changing the number of concurrent connections:

Similarly, in Jetty, you can control the number of connections held by modifying jetty.xml:

Similarly, for other containers, these files can change appropriately. Many containers provide a cache on top of the application to avoid server hits. This cache can be utilized for static pages such as the search page. Containers such as Weblogic provide a development versus production mode. Typically, a development mode runs with 15 threads and a limited JDBC pool size by default, whereas, for a production mode, this can be increased. For tuning containers, besides standard optimization, specific performance-tuning guidelines should be followed, as shown in the following table:

Container

Performance tuning guide

Jetty

http://wiki.eclipse.org/Jetty/Howto/High_Load

Tomcat

http://www.mulesoft.com/tcat/tomcat-performance and http://javamaster.wordpress.com/2013/03/13/apache-tomcat-tuning-guide/

JBoss

https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Application_Platform/5/pdf/Performance_Tuning_Guide/JBoss_Enterprise_Application_Platform-5-Performance_Tuning_Guide-en-US.pdf

Weblogic

http://docs.oracle.com/cd/E13222_01/wls/docs92/perform/WLSTuning.html

Websphere

http://www.ibm.com/developerworks/websphere/techjournal/0909_blythe/0909_blythe.html

Apache Solr works better with the default container it ships with, Jetty, since it offers a small footprint compared to other containers such as JBoss and Tomcat for which the memory required is a little higher.

Summary

In this article, we have learned about about Apache Solr which runs on the underlying JVM in the J2EE container and tuning containers.

Resources for Article:


Further resources on this subject:


Books to Consider

comments powered by Disqus