Reader small image

You're reading from  Apache Solr High Performance

Product typeBook
Published inMar 2014
Reading LevelIntermediate
Publisher
ISBN-139781782164821
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Surendra Mohan
Surendra Mohan
author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan

Right arrow

Chapter 3. Performance Optimization

In this chapter, we will learn more different ways to optimize your Solr's performance, starting from understanding the required factors that affect performance and heading towards advanced concepts such as index replication using the master-slave architecture. We will also learn more about playing around with multiple Solr servers, sharding, distributed search, and much more. We will cover the following topics:

  • Solr performance factors

  • Solr caching

  • Using SolrCloud

  • Near real-time search

So, let us get started.

Solr performance factors


In this section, we will understand the factors and metrics that impact Solr's performance. The following are the metrics and the parameters that you should look into in order to see the impact of the changes you have performed:

  • Transactions Per Second (TPS): This denotes the number of search queries and document updates you are able to perform in a second. To have a better understanding, you may navigate to the statistics page and look at the avgTimePerRequest and avgRequestsPerSecond parameters of your request handler.

  • Memory usage: While tweaking components to manage the memory usage, you need to ensure that the memory used by Solr doesn't increase day-by-day, though a slight increase in this may be acceptable. However, if this usage keeps increasing without any constraint, you will be prone to receive out-of-memory errors. In such a situation, TPS increases significantly, and extra care needs to be taken so as to debug and stabilize memory usage. You need to...

Solr caching


In this section, we will learn about the different caching techniques and ways to configure them appropriately so as to achieve better performance of your Solr instance.

Document caching

Document cache, one of the cache types available with us, stores Lucene's internal documents fetched from the disk. In order to get the document caching to work at its optimal level, you need to configure it appropriately so as to minimize I/O calls that result in boosted deployment performance.

Let us assume that we are dealing with the deployment of Solr, where we have approximately 100,000 documents to address. Additionally, our single Solr instance gets a maximum of 10 concurrent queries and each query can fetch 220 documents, which is the maximum count.

Based on the preceding parameters, our documentcache tag should look similar to the following code snippet (add the following code to your solrconfig.xml file):

<documentCache
  class="solr.LRUCache"
  size="2200"
  initialSize="2200"/>...

Using SolrCloud


As you might be aware, a new feature named SolrCloud was introduced in Apache Solr 4.0, and it enables us to perform searching and distributed indexing at a full scale. Prior to SolrCloud, the sharding concept was heavily used as far as managing a Solr distributed cluster was concerned. However, managing it was a challenge, which allowed the SolrCloud concept to come into play and made the activity even easier and more robust. Let us go through the challenges faced using sharding, the cons of which made SolrCloud exhale. They are as follows:

  • Maintenance of the index view: Sharding restricts updations and deletions to be forwarded to the appropriate shard, to ensure there is only one version of each document.

  • Auto-failure recovery: If a shard goes down, that portion of the index goes offline and you need to bring it up and run it manually with a backup.

  • Cluster configuration: Using sharding in a distributed environment and managing schema.xml and solrconfig.xml can be quite...

Summary


In this chapter, we mainly concentrated on different techniques to optimize Solr's performance. We started with understanding the various performance factors responsible for Solr's performance and covered vital concepts such as how to replicate an index in a master-slave architecture, and learned more about implementing different Solr-caching techniques such as document caching, query result caching, filter caching, and finally how to cache the whole result page. We also understood SolrCloud and how to perform various activities based on performance optimization, such as creating a SolrCloud cluster, having more than one collection in a SolrCloud cluster, and managing the SolrCloud cluster that you have created or those that already exist. Additionally, we learned how to play around with distributed indexing and searching, which are automated activities carried out on the documents, and how to stop automatic document distribution based on the certain scenarios. By the end of the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr High Performance
Published in: Mar 2014Publisher: ISBN-13: 9781782164821
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/. He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more. Additionally, he writes technical blogs and articles with SitePoint.com. His published blogs and articles can be found at http://www.sitepoint.com/author/smohan/.
Read more about Surendra Mohan