Free Sample
+ Collection

Apache Solr 4 Cookbook

Rafał Kuć

Apache Solr 4 can transform the effectiveness of your search engines and this book will show you how. Jump straight into the hands-on recipes and get a fast understanding of the latest and greatest in open source search.
RRP $26.99
RRP $44.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782161325
Paperback328 pages

About This Book

  • Learn how to make Apache Solr search faster, more complete, and comprehensively scalable
  • Solve performance, setup, configuration, analysis, and query problems in no time
  • Get to grips with, and master, the new exciting features of Apache Solr 4

Who This Book Is For

This book is for developers who wish to learn how to master Apache Solr 4. This book will specifically appeal to developers who wish to quickly get to grips with the changes and new features of Apache Solr 4. This book is also handy as a practical guide to solving common problems and issues when using Apache Solr.

Table of Contents

Chapter 1: Apache Solr Configuration
Running Solr on Jetty
Running Solr on Apache Tomcat
Installing a standalone ZooKeeper
Clustering your data
Choosing the right directory implementation
Configuring spellchecker to not use its own index
Solr cache configuration
How to fetch and index web pages
How to set up the extracting request handler
Changing the default similarity implementation
Chapter 2: Indexing Your Data
Indexing PDF files
Generating unique fields automatically
Extracting metadata from binary files
How to properly configure Data Import Handler with JDBC
Indexing data from a database using Data Import Handler
How to import data using Data Import Handler and delta query
How to use Data Import Handler with the URL data source
How to modify data while importing with Data Import Handler
Updating a single field of your document
Handling multiple currencies
Detecting the document's language
Optimizing your primary key field indexing
Chapter 3: Analyzing Your Text Data
Storing additional information using payloads
Eliminating XML and HTML tags from text
Copying the contents of one field to another
Changing words to other words
Splitting text by CamelCase
Splitting text by whitespace only
Making plural words singular without stemming
Lowercasing the whole string
Storing geographical points in the index
Stemming your data
Preparing text to perform an efficient trailing wildcard search
Splitting text by numbers and non-whitespace characters
Using Hunspell as a stemmer
Using your own stemming dictionary
Protecting words from being stemmed
Chapter 4: Querying Solr
Asking for a particular field value
Sorting results by a field value
How to search for a phrase, not a single word
Boosting phrases over words
Positioning some documents over others on a query
Positioning documents with words closer to each other first
Sorting results by a distance from a point
Getting documents with only a partial match
Affecting scoring with functions
Nesting queries
Modifying returned documents
Using parent-child relationships
Ignoring typos in terms of performance
Detecting and omitting duplicate documents
Using field aliases
Returning a value of a function in the results
Chapter 5: Using the Faceting Mechanism
Getting the number of documents with the same field value
Getting the number of documents with the same value range
Getting the number of documents matching the query and subquery
Removing filters from faceting results
Sorting faceting results in alphabetical order
Implementing the autosuggest feature using faceting
Getting the number of documents that don't have a value in the field
Having two different facet limits for two different fields in the same query
Using decision tree faceting
Calculating faceting for relevant documents in groups
Chapter 6: Improving Solr Performance
Paging your results quickly
Configuring the document cache
Configuring the query result cache
Configuring the filter cache
Improving Solr performance right after the startup or commit operation
Caching whole result pages
Improving faceting performance for low cardinality fields
What to do when Solr slows down during indexing
Analyzing query performance
Avoiding filter caching
Controlling the order of execution of filter queries
Improving the performance of numerical range queries
Chapter 7: In the Cloud
Creating a new SolrCloud cluster
Setting up two collections inside a single cluster
Managing your SolrCloud cluster
Understanding the SolrCloud cluster administration GUI
Distributed indexing and searching
Increasing the number of replicas on an already live cluster
Stopping automatic document distribution among shards
Chapter 8: Using Additional Solr Functionalities
Getting more documents similar to those returned in the results list
Highlighting matched words
How to highlight long text fields and get good performance
Sorting results by a function value
Searching words by how they sound
Ignoring defined words
Computing statistics for the search results
Checking the user's spelling mistakes
Using field values to group results
Using queries to group results
Using function queries to group results
Chapter 9: Dealing with Problems
How to deal with too many opened files
How to deal with out-of-memory problems
How to sort non-English languages properly
How to make your index smaller
Diagnosing Solr problems
How to avoid swapping

What You Will Learn

  • Efficient and configurable Apache Solr 4 setup
  • Index your data in different formats, forms, and sources
  • Implement different autocomplete functionality
  • Achieve near real time search with Apache Solr 4
  • Improve and benchmark Apache Solr for increased performance
  • Master SolrCloud functionality
  • Diagnose and resolve your problems with Apache Solr 4
  • Improve the relevance of your queries
  • Overcome common problems when analyzing your data

In Detail

Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.

"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.

"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.

With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.


Read More