Solr Cookbook - Third Edition

Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Solr Cookbook - Third Edition

Cookbook
Rafał Kuć

1 customer reviews
Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Preview in Mapt

Book Details

ISBN 139781783553150
Paperback356 pages

Book Description

Starting with vital information on setting up Solr, you will quickly progress to analyzing your text data through querying and performance improvement.

With the help of intermediate and advanced recipes, you will learn how to index data and query Solr. Then, you will deep dive into faceting and learn how to improve Solr's performance. You will also work with SolrCloud clusters and will get to grips with the advanced functionalities of Solr. Finally, you will explore real-life situations, where Solr can be used to simplify daily collection handling. By the end of this book, you will be able to produce enhanced, optimized, and powerful results by implementing pro-level practices and techniques.

Table of Contents

Chapter 1: Apache Solr Configuration
Introduction
Running Solr on a standalone Jetty
Installing ZooKeeper for SolrCloud
Migrating configuration from master-slave to SolrCloud
Choosing the proper directory configuration
Configuring the Solr spellchecker
Using Solr in a schemaless mode
Limiting I/O usage
Using core discovery
Configuring SolrCloud for NRT use cases
Configuring SolrCloud for high-indexing use cases
Configuring SolrCloud for high-querying use cases
Configuring the Solr heartbeat mechanism
Changing similarity
Chapter 2: Indexing Your Data
Introduction
Indexing PDF files
Counting the number of fields
Using parsing update processors to parse data
Using scripting update processors to modify documents
Indexing data from a database using Data Import Handler
Incremental imports with DIH
Transforming data when using DIH
Indexing multiple geographical points
Updating document fields
Detecting the document language during indexation
Optimizing the primary key indexation
Handling multiple currencies
Chapter 3: Analyzing Your Text Data
Introduction
Using the enumeration type
Removing HTML tags during indexing
Storing data outside of Solr index
Using synonyms
Stemming different languages
Using nonaggressive stemmers
Using the n-gram approach to do performant trailing wildcard searches
Using position increment to divide sentences
Using patterns to replace tokens
Chapter 4: Querying Solr
Introduction
Understanding and using the Lucene query language
Using position aware queries
Using boosting with autocomplete
Phrase queries with shingles
Handling user queries without errors
Handling hierarchies with nested documents
Sorting data on the basis of a function value
Controlling the number of terms needed to match
Affecting document score using function queries
Using simple nested queries
Using the Solr document query join functionality
Handling typos with n-grams
Rescoring query results
Chapter 5: Faceting
Introduction
Getting the number of documents with the same field value
Getting the number of documents with the same value range
Getting the number of documents matching the query and subquery
Removing filters from faceting results
Using decision tree faceting
Calculating faceting for relevant documents in groups
Improving faceting performance for low cardinality fields
Chapter 6: Improving Solr Performance
Introduction
Handling deep paging efficiently
Configuring the document cache
Configuring the query result cache
Configuring the filter cache
Improving Solr query performance after the start and commit operations
Lowering the memory consumption of faceting and sorting
Speeding up indexing with Solr segment merge tuning
Avoiding caching of rare filters to improve the performance
Controlling the filter execution to improve expensive filter performance
Configuring numerical fields for high-performance sorting and range queries
Chapter 7: In the Cloud
Introduction
Creating a new SolrCloud cluster
Setting up multiple collections on a single cluster
Splitting shards
Having more than a single shard from a collection on a node
Creating a collection on defined nodes
Adding replicas after collection creation
Removing replicas
Moving shards between nodes
Using aliasing
Using routing
Chapter 8: Using Additional Functionalities
Introduction
Finding similar documents
Highlighting fragments found in documents
Efficient highlighting
Using versioning
Retrieving information about the index structure
Altering the index structure on a live collection
Grouping documents by the field value
Grouping documents by the query value
Grouping documents by the function value
Efficient documents grouping using the post filter
Chapter 9: Dealing with Problems
Introduction
Dealing with the too many opened files exception
Diagnosing and dealing with memory problems
Configuring sorting for non-English languages
Migrating data to another collection
SolrCloud read-side fault tolerance
Using the check index functionality
Adjusting the Jetty configuration to avoid deadlocks
Tuning segment merging
Avoiding swapping
Chapter 10: Real-life Situations
Introduction
Implementing the autocomplete functionality for products
Implementing the autocomplete functionality for categories
Handling time-sliced data using aliases
Boosting words closer to each other
Using the Solr spellchecking functionality
Using the Solr administration panel for monitoring
Automatically expiring Solr documents
Exporting whole query results

What You Will Learn

  • Acquire the skills needed to index your data in different formats, forms, and sources
  • Overcome common problems while analyzing your data
  • Use the faceting mechanism to get aggregated information about your data
  • Improve your Solr instance and Solr cluster performance
  • Get to know how to configure and use SolrCloud
  • Make use of the highlighting and document grouping functionalities
  • Diagnose and resolve problems with Solr instances and clusters
  • Implement different autocomplete functionalities

Authors

Table of Contents

Chapter 1: Apache Solr Configuration
Introduction
Running Solr on a standalone Jetty
Installing ZooKeeper for SolrCloud
Migrating configuration from master-slave to SolrCloud
Choosing the proper directory configuration
Configuring the Solr spellchecker
Using Solr in a schemaless mode
Limiting I/O usage
Using core discovery
Configuring SolrCloud for NRT use cases
Configuring SolrCloud for high-indexing use cases
Configuring SolrCloud for high-querying use cases
Configuring the Solr heartbeat mechanism
Changing similarity
Chapter 2: Indexing Your Data
Introduction
Indexing PDF files
Counting the number of fields
Using parsing update processors to parse data
Using scripting update processors to modify documents
Indexing data from a database using Data Import Handler
Incremental imports with DIH
Transforming data when using DIH
Indexing multiple geographical points
Updating document fields
Detecting the document language during indexation
Optimizing the primary key indexation
Handling multiple currencies
Chapter 3: Analyzing Your Text Data
Introduction
Using the enumeration type
Removing HTML tags during indexing
Storing data outside of Solr index
Using synonyms
Stemming different languages
Using nonaggressive stemmers
Using the n-gram approach to do performant trailing wildcard searches
Using position increment to divide sentences
Using patterns to replace tokens
Chapter 4: Querying Solr
Introduction
Understanding and using the Lucene query language
Using position aware queries
Using boosting with autocomplete
Phrase queries with shingles
Handling user queries without errors
Handling hierarchies with nested documents
Sorting data on the basis of a function value
Controlling the number of terms needed to match
Affecting document score using function queries
Using simple nested queries
Using the Solr document query join functionality
Handling typos with n-grams
Rescoring query results
Chapter 5: Faceting
Introduction
Getting the number of documents with the same field value
Getting the number of documents with the same value range
Getting the number of documents matching the query and subquery
Removing filters from faceting results
Using decision tree faceting
Calculating faceting for relevant documents in groups
Improving faceting performance for low cardinality fields
Chapter 6: Improving Solr Performance
Introduction
Handling deep paging efficiently
Configuring the document cache
Configuring the query result cache
Configuring the filter cache
Improving Solr query performance after the start and commit operations
Lowering the memory consumption of faceting and sorting
Speeding up indexing with Solr segment merge tuning
Avoiding caching of rare filters to improve the performance
Controlling the filter execution to improve expensive filter performance
Configuring numerical fields for high-performance sorting and range queries
Chapter 7: In the Cloud
Introduction
Creating a new SolrCloud cluster
Setting up multiple collections on a single cluster
Splitting shards
Having more than a single shard from a collection on a node
Creating a collection on defined nodes
Adding replicas after collection creation
Removing replicas
Moving shards between nodes
Using aliasing
Using routing
Chapter 8: Using Additional Functionalities
Introduction
Finding similar documents
Highlighting fragments found in documents
Efficient highlighting
Using versioning
Retrieving information about the index structure
Altering the index structure on a live collection
Grouping documents by the field value
Grouping documents by the query value
Grouping documents by the function value
Efficient documents grouping using the post filter
Chapter 9: Dealing with Problems
Introduction
Dealing with the too many opened files exception
Diagnosing and dealing with memory problems
Configuring sorting for non-English languages
Migrating data to another collection
SolrCloud read-side fault tolerance
Using the check index functionality
Adjusting the Jetty configuration to avoid deadlocks
Tuning segment merging
Avoiding swapping
Chapter 10: Real-life Situations
Introduction
Implementing the autocomplete functionality for products
Implementing the autocomplete functionality for categories
Handling time-sliced data using aliases
Boosting words closer to each other
Using the Solr spellchecking functionality
Using the Solr administration panel for monitoring
Automatically expiring Solr documents
Exporting whole query results

Book Details

ISBN 139781783553150
Paperback356 pages
Read More
From 1 reviews

Read More Reviews