Packt+ | Advance your knowledge in tech

You're reading from Solr Cookbook - Third Edition

Product typeBook

Published inJan 2015

Reading LevelIntermediate

Publisher

ISBN-139781783553150

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Author (1)

Rafal Kuc

Configuring SolrCloud for high-indexing use cases

Solr is designed to work under high load, both when it comes to querying and indexing. However, the default configuration provided with the example Solr deployment is not sufficient when it comes to these use cases. This recipe will show you how to prepare your SolrCloud collection configuration for use cases when the indexing rate is very high.

Getting ready

Before continuing reading the recipe, read the Running Solr on a standalone Jetty and Configuring SolrCloud for NRT use cases recipes in this chapter.

How to do it...

In very high indexing use cases, there are chances that you'll use bulk indexing to index your data. In addition to this, because we are talking about SolrCloud, we'll use autocommit so that we can leave the data durability and visibility management to Solr. Let's discuss how to prepare configuration for a use case where indexing is high, but the querying is quite low; for example, when using Solr for log centralization solutions.

Let's assume that we are indexing more than 1,000 documents per second and that we have four nodes, each of 12 cores and 64 GB of RAM. Note that this specification is not something we need to index the number of documents, but they are here for reference.

First, we'll start with the autocommit configuration, which will look as follows (we add this to the solrconfig.xml file):

<updateHandler class="solr.DirectUpdateHandler2">
 <updateLog>
  <str name="dir">${solr.ulog.dir:}</str>
 </updateLog>

 <autoSoftCommit>
  <maxTime>600000</maxTime>
 </autoSoftCommit>

 <autoCommit>
  <maxTime>15000</maxTime>
  <openSearcher>false</openSearcher>
 </autoCommit>
</updateHandler>

The second step is to adjust the number of indexing threads. To do this, we add the following information to the indexConfig section of solrconfig.xml:
```
<maxIndexingThreads>10</maxIndexingThreads>
```
The third step is to adjust the memory buffer size for each indexing thread. To do this, we add the following information to the indexConfig section of solrconfig.xml:
```
<ramBufferSizeMB>128</ramBufferSizeMB>
```

Now, let's discuss what each of these changes mean.

How it works...

We started with tuning the autocommit setting, which you should be aware of after reading this recipe. Since we are not worried about documents being visible as soon as they are indexed, we set the soft autocommit's maxTime property to 600000. This means that we will reopen the searcher every 10 minutes, so our documents will be visible maximum 10 minutes after they are sent to indexation.

The one thing to look at is the short time for hard commit, which is every 15 seconds (the maxTime property of the autoCommit section set to 15000). We did this because we don't want transaction logs to contain a high number of entries because this can cause problems during the recovery process.

We also increased the default number of threads an index writer can use from the default 8 to 10 by setting the maxIndexingThreads property. Since we have 12 cores on each machine, and we are not querying much, we can allow more threads using the index writer. If the index writer uses the number of threads that's equal to the maxIndexingThreads property, the next thread will wait for one of the currently running to end. Remember that the maxIndexingThreads property sets the maximum allowed indexing threads, which doesn't mean they will be used every time.

We also increased the default RAM buffer size from 100 to 128 using the ramBufferSizeMB property. We did this to allow Lucene to buffer as many documents as needed in memory. If the size of the documents in the buffer is larger than the given value of the ramBufferSizeMB property, Lucene will flush the data to the directory, which will decide what else to do. We have to remember though that we are also using autocommit, so the data will be flushed every 15 seconds because of hard autocommit settings.

Note

Remember that we didn't take into consideration the size of the cluster because we had the maximum number of nodes. You should remember that if I/O is the bottleneck when indexing, spreading the collection among more nodes should help with the indexing load.

In addition to this, you might want to look at the merging policy and segment merge processes as this can become a major bottleneck. If you are interested, refer to the Tuning segment merging recipe in Chapter 9, Dealing with Problems.

You have been reading a chapter from

Solr Cookbook - Third Edition

Published in: Jan 2015Publisher: ISBN-13: 9781783553150

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages