Apache Solr 3.1 Cookbook

Over 100 recipes to discover new ways to work with Apache’s Enterprise Search Server

Apache Solr 3.1 Cookbook

Cookbook
Rafał Kuć

Over 100 recipes to discover new ways to work with Apache’s Enterprise Search Server
$26.99
$44.99
RRP $26.99
RRP $44.99
eBook
Print + eBook
$12.99 p/month

Get Access

Get Unlimited Access to every Packt eBook and Video course

Enjoy full and instant access to over 3000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

+ Collection
Free Sample

Book Details

ISBN 139781849512183
Paperback300 pages

About This Book

  • Improve the way in which you work with Apache Solr to make your search engine quicker and more effective
  • Deal with performance, setup, and configuration problems in no time
  • Discover little-known Solr functionalities and create your own modules to customize Solr to your company's needs
  • Part of Packt's Cookbook series; each chapter covers a different aspect of working with Solr

Who This Book Is For

Developers who are working with Apache Solr and would like to know how to combat common problems will find this book of great use. Knowledge of Apache Lucene would be a bonus but is not required.

Table of Contents

Chapter 1: Apache Solr Configuration
Introduction
Running Solr on Jetty
Running Solr on Apache Tomcat
Using the Suggester component
Handling multiple languages in a single index
Indexing fields in a dynamic way
Making multilingual data searchable with multicore deployment
Solr cache configuration
How to fetch and index web pages
Getting the most relevant results with early query termination
How to set up Extracting Request Handler
Chapter 2: Indexing your Data
Introduction
Indexing data in CSV format
Indexing data in XML format
Indexing data in JSON format
Indexing PDF files
Indexing Microsoft Office files
Extracting metadata from binary files
How to properly configure Data Import Handler with JDBC
Indexing data from a database using Data Import Handler
How to import data using Data Import Handler and delta query
How to use Data Import Handler with URL Data Source
How to modify data while importing with Data Import Handler
Chapter 3: Analyzing your Text Data
Introduction
Storing additional information using payloads
Eliminating XML and HTML tags from the text
Copying the contents of one field to another
Changing words to other words
Splitting text by camel case
Splitting text by whitespace only
Making plural words singular, but without stemming
Lowercasing the whole string
Storing geographical points in the index
Stemming your data
Preparing text to do efficient trailing wildcard search
Splitting text by numbers and non-white space characters
Chapter 4: Solr Administration
Introduction
Monitoring Solr via JMX
How to check the cache status
How to check how the data type or field behave
How to check Solr query handler usage
How to check Solr update handler usage
How to change Solr instance logging configuration
How to check the Java based replication status
How to check the script based replication status
Setting up a Java based index replication
Setting up script based replication
How to manage Java based replication status using HTTP commands
How to analyze your index structure
Chapter 5: Querying Solr
Introduction
Asking for a particular field value
Sorting results by a field value
Choosing a different query parser
How to search for a phrase, not a single word
Boosting phrases over words
Positioning some documents over others on a query
Positioning documents with words closer to each other first
Sorting results by a distance from a point
Getting documents with only a partial match
Affecting scoring with function
Nesting queries
Chapter 6: Using Faceting Mechanism
Introduction
Getting the number of documents with the same field value
Getting the number of documents with the same date range
Getting the number of documents with the same value range
Getting the number of documents matching the query and sub query
How to remove filters from faceting results
How to name different faceting results
How to sort faceting results in an alphabetical order
How to implement the autosuggest feature using faceting
How to get the number of documents that don't have a value in the field
How to get all the faceting results, not just the first hundred ones
How to have two different facet limits for two different fields in the same query
Chapter 7: Improving Solr Performance
Introduction
Paging your results quickly
Configuring the document cache
Configuring the query result cache
Configuring the filter cache
Improving Solr performance right after the startup or commit operation
Setting up a sharded deployment
Caching whole result pages
Improving faceting performance
What to do when Solr slows down during indexing when using Data Import Handler
Getting the first top documents fast when having millions of them
Chapter 8: Creating Applications that use Solr and Developing your Own Solr Modules
Introduction
Choosing a different response format than the default one
Using Solr with PHP
Using Solr with Ruby
Using SolrJ to query Solr
Developing your own request handler
Developing your own filter
Developing your own search component
Developing your own field type
Chapter 9: Using Additional Solr Functionalities
Introduction
Getting more documents similar to those returned in the results list
Presenting search results in a fast and easy way
Highlighting matched words
How to highlight long text fields and get good performance
Sorting results by a function value
Searching words by how they sound
Ignoring defined words
Computing statistics for the search results
Checking user's spelling mistakes
Using "group by" like functionalities in Solr
Chapter 10: Dealing with Problems
Introduction
How to deal with a corrupted index
How to reduce the number of files the index is made of
How to deal with a locked index
How to deal with too many opened files
How to deal with out of memory problems
How to sort non-English languages properly
How to deal with the infinite loop exception when using shards
How to deal with garbage collection running too long
How to update only one field in all documents without the need of full indexation
How to make your index smaller

What You Will Learn

  • Index data in different formats and forms
  • Use the Solr administration panel to discover the most commonly searched for information
  • Learn how to use different data grouping techniques
  • Improve your Solr deployment performance
  • Create and use your own Apache Solr modules
  • Configure your cache to cater for changes in your data
  • Import data using the Data Import Handler and delta query
  • Query Solr to search for phrases, sort results by different fields, and search geographical points
  • Create new applications that use Solr
  • Reduce the size of your index for faster searching

In Detail

Apache Solr is a fast, scalable, modern, open source, and easy-to-use search engine. It allows you to develop a professional search engine for your ecommerce site, web application, or back office software. Setting up Solr is easy, but configuring it to get the most out of your site is the difficult bit.

The Solr 3.1 Cookbook will make your everyday work easier by using real-life examples that show you how to deal with the most common problems that can arise while using the Apache Solr search engine. Why waste your time searching the Internet for solutions when you can have all the answers in one place?

This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. You will learn about functionalities that most newbies are unaware of, such as sorting results by a function value, highlighting matched words, and computing statistics to make your work with Solr easy and stress free.

Authors

Table of Contents

Chapter 1: Apache Solr Configuration
Introduction
Running Solr on Jetty
Running Solr on Apache Tomcat
Using the Suggester component
Handling multiple languages in a single index
Indexing fields in a dynamic way
Making multilingual data searchable with multicore deployment
Solr cache configuration
How to fetch and index web pages
Getting the most relevant results with early query termination
How to set up Extracting Request Handler
Chapter 2: Indexing your Data
Introduction
Indexing data in CSV format
Indexing data in XML format
Indexing data in JSON format
Indexing PDF files
Indexing Microsoft Office files
Extracting metadata from binary files
How to properly configure Data Import Handler with JDBC
Indexing data from a database using Data Import Handler
How to import data using Data Import Handler and delta query
How to use Data Import Handler with URL Data Source
How to modify data while importing with Data Import Handler
Chapter 3: Analyzing your Text Data
Introduction
Storing additional information using payloads
Eliminating XML and HTML tags from the text
Copying the contents of one field to another
Changing words to other words
Splitting text by camel case
Splitting text by whitespace only
Making plural words singular, but without stemming
Lowercasing the whole string
Storing geographical points in the index
Stemming your data
Preparing text to do efficient trailing wildcard search
Splitting text by numbers and non-white space characters
Chapter 4: Solr Administration
Introduction
Monitoring Solr via JMX
How to check the cache status
How to check how the data type or field behave
How to check Solr query handler usage
How to check Solr update handler usage
How to change Solr instance logging configuration
How to check the Java based replication status
How to check the script based replication status
Setting up a Java based index replication
Setting up script based replication
How to manage Java based replication status using HTTP commands
How to analyze your index structure
Chapter 5: Querying Solr
Introduction
Asking for a particular field value
Sorting results by a field value
Choosing a different query parser
How to search for a phrase, not a single word
Boosting phrases over words
Positioning some documents over others on a query
Positioning documents with words closer to each other first
Sorting results by a distance from a point
Getting documents with only a partial match
Affecting scoring with function
Nesting queries
Chapter 6: Using Faceting Mechanism
Introduction
Getting the number of documents with the same field value
Getting the number of documents with the same date range
Getting the number of documents with the same value range
Getting the number of documents matching the query and sub query
How to remove filters from faceting results
How to name different faceting results
How to sort faceting results in an alphabetical order
How to implement the autosuggest feature using faceting
How to get the number of documents that don't have a value in the field
How to get all the faceting results, not just the first hundred ones
How to have two different facet limits for two different fields in the same query
Chapter 7: Improving Solr Performance
Introduction
Paging your results quickly
Configuring the document cache
Configuring the query result cache
Configuring the filter cache
Improving Solr performance right after the startup or commit operation
Setting up a sharded deployment
Caching whole result pages
Improving faceting performance
What to do when Solr slows down during indexing when using Data Import Handler
Getting the first top documents fast when having millions of them
Chapter 8: Creating Applications that use Solr and Developing your Own Solr Modules
Introduction
Choosing a different response format than the default one
Using Solr with PHP
Using Solr with Ruby
Using SolrJ to query Solr
Developing your own request handler
Developing your own filter
Developing your own search component
Developing your own field type
Chapter 9: Using Additional Solr Functionalities
Introduction
Getting more documents similar to those returned in the results list
Presenting search results in a fast and easy way
Highlighting matched words
How to highlight long text fields and get good performance
Sorting results by a function value
Searching words by how they sound
Ignoring defined words
Computing statistics for the search results
Checking user's spelling mistakes
Using "group by" like functionalities in Solr
Chapter 10: Dealing with Problems
Introduction
How to deal with a corrupted index
How to reduce the number of files the index is made of
How to deal with a locked index
How to deal with too many opened files
How to deal with out of memory problems
How to sort non-English languages properly
How to deal with the infinite loop exception when using shards
How to deal with garbage collection running too long
How to update only one field in all documents without the need of full indexation
How to make your index smaller

Book Details

ISBN 139781849512183
Paperback300 pages
Read More