Okay, so you have your brand new car up and running and have been using it judiciously day in and day out, but you don't maintain it properly from time to time! What will happen? Of course, the performance is going to deteriorate over a period of time. Another thing could be that your car supports automatic parking but you never found out how to override the default setting. In such a case, a manual comes in handy for learning and tweaking all the features that your car can provide. Similarly, you need to fine-tune and manage your Solr so as to get the most out of it. This is exactly what we are going to see in this chapter.
You're reading from Mastering Apache Solr 7.x
One of the things that you need to take particular care of when you are working on any Java-based application is configuring the JVM optimally, and Solr is no exception.
Anyone who has worked with Java-based applications would have surely come across setting the heap space. We do it using -Xms
and -Xmx
. Suppose I set following the command-line option:
-Xms256m-Xmx2048m
Here, Xms
specifies our initial memory allocation pool, whereas Xmx
specifies the maximum memory allocation pool for JVM. In the case we just saw, our JVM will start with 256 MB of memory and will be able to use up to 2 GB of memory.
If we require more heap space, then we can increase -Xms
. We can also decide not to give any initial heap space at all and let JVM use the heap space as per the need, but this may increase our startup time. Similarly, failing to set up the maximum heap size properly can result in OutOfMemoryException
. Proper garbage collection JVM parameters should be set...
As we already know now, solrconfig.xml
forms the heart of Solr when it comes to configuring Solr.
There are two ways in which this file is modified:
- By making direct changes in
solrconfig.xml
- Using the config API to create
configoverlay.json
, which holds configuration overlays to modify the default values specified insolrconfig.xml
The solrconfig.xml
file is used to configure the admin web interface. It can be used to change parameters for replication and duplication. We can change the request dispatcher too using solrconfig.xml
. Various listeners and request handlers can be configured using solrconfig.xml
.
Go to any of the conf directories for a collection and you will find solrconfig.xml
inside. Navigate to SOLR_HOME/server/solr/configsets
and you will see various configurations that follow best practices for configuring Solr.
Solr allows you to specify a variable for the property value, which can be replaced at runtime with the following syntax:
${propertyname[:default...
Going into production, we obviously need a proper backup and restore plan. The last thing we would want is for our hard disk to crash and all our index data to disappear or get corrupted.
Solr provides two ways to back up based on how you are running it:
- Collections API in SolrCloud mode
- Replication handler in standalone mode
As mentioned earlier, using the collections API, we can take backups in SolrCloud. Doing so will ensure that the backups are generated across multiple shards; and then, at the time of restore, we use the same number of shards and replicas as the original collection. The commands are listed here:
Command name | Description |
| Used to back up Solr indexes and configuration |
| Used to restore Solr indexes and configuration |
Java Management Extensions (JMX) is a technology that was released in the J2SE 5.0 release; it provides tools for managing and monitoring resources dynamically at runtime. It is used in enterprise applications to make configurable systems and get the state of an enterprise application at any point of time. The resources are represented by managed beans (MBeans).
Solr can be controlled via the JMX interface; we can make use of VisualVM or JConsole to connect with Solr.
Solr will automatically identify its location on startup if you have an MBean server running in Solr's JVM or if you start Solr with the Dcom.sun.management.jmxremote
system property.
Alternatively, you can configure by defining a metrics reporter.
On a remote Solr server, if you need to do JMX-enabled Java profiling, then you have to enable remote JMX access when starting the Solr server.
Open solr.in.cmd
or solr.in.sh
in the SOLR_HOME/bin
directory and set the ENABLE_REMOTE_JMX_OPTS
property to true...
Setting up logs is a key part of any enterprise application and Solr is no exception. Luckily, Solr provides many different ways to tweak the default logging configuration.
Using Solr's admin web interface, we can set various log levels. Go to the admin interface by typing the following URL:
http://localhost:8983/solr/
You should see the following admin screen:
You will see that on the left-hand side, there is a Logging
option. Click on it and there will be a submenu item called Level
, which will open up the following screen:
Here, we can set the logging level for many different log categories in a hierarchical order. For example, let's say I want to set org.apache.http.conn.ssl
to log level and set all the subcategories under it to debug level; I will click on the edit icon next to ssl
, as shown here:
This will open up a small popup with various log levels that we can set.
One of the must have when going to production is clustering for fault tolerance and high availability. Solr's answer to this is SolrCloud, which provides ways to have distributed indexing and search capabilities with central configuration for the entire cluster, and load balancing with failover support.
As mentioned earlier, Solr provides distributed searching. Behind the scenes, Solr makes use of ZooKeeper to manage nodes.
In SolrCloud, data is distributed in multiple shards, which can be hosted on multiple boxes having replicas; this provides redundancy, fault tolerance, and scalability. ZooKeeper holds the strings to manage the shards and replication and to decide which server will handle a specific request.
In this example, we will see a basic SSL setup using a self-signed certificate. Enabling SSL ensures that communication between the client and Solr server is encrypted.
Before generating a self-signed certificate, ensure that you have OpenSSL installed on your machine. To check whether OpenSSL is already installed, type the following command in the Command Prompt:
openssl version
It should print out the current version of OpenSSL running on your system. If it does not do so, kindly download the latest version of OpenSSL for your operating system and then install it.
We will also make use of JDK's keytool for generating self-signed certificates.
In order to measure performance, Solr provides statistics and metrics; they can read either using Metrics API or by enabling JMX.
Both search and update request handlers provide various statistics.
The API request path for search is http://localhost:8983/solr/admin/metrics?group=mycore&prefix=QUERY./select
.
Similarly the API request path for update is http://localhost:8983/solr/admin/metrics?group=mycore&prefix=UPDATE./update.
There are various attributes that can be added at the end of both of these URLs to get various statistics, as listed here:
5minRate
: Used to find out the requests per second that have we received in the last 5 minutes.15minRate
: Same as5minRate
, but here we check for requests per second in the last 15 minutes.p75_ms/p95_ms/p99_ms/p999_ms
: Each of the four attributes represent how much processing timex
percentile of the request took.x
is to be replaced by the number specified.count
: Number of requests made...
In this chapter, we saw the various tuning parameters needed to take Solr to production. We started off with JVM parameters, and then saw how to manage solrconfig.xml
. We got an understanding of taking backups, setting up JMX, and configuring logs. Finally, we had an overview of SolrCloud.
In the next chapter, we will see various Client APIs made available by Solr.