Apache Solr 4 Cookbook
Formats:
save 15%!
save 37%!
Free Shipping!
| Also available on: |
|
- Learn how to make Apache Solr search faster, more complete, and comprehensively scalable
- Solve performance, setup, configuration, analysis, and query problems in no time
- Get to grips with, and master, the new exciting features of Apache Solr 4
Book Details
Language : EnglishPaperback : 328 pages [ 235mm x 191mm ]
Release Date : January 2013
ISBN : 1782161325
ISBN 13 : 9781782161325
Author(s) : Rafał Kuć
Topics and Technologies : All Books, Cookbooks, Open Source
Table of Contents
PrefaceChapter 1: Apache Solr Configuration
Chapter 2: Indexing Your Data
Chapter 3: Analyzing Your Text Data
Chapter 4: Querying Solr
Chapter 5: Using the Faceting Mechanism
Chapter 6: Improving Solr Performance
Chapter 7: In the Cloud
Chapter 8: Using Additional Solr Functionalities
Chapter 9: Dealing with Problems
Appendix: Real-life Situations
Index
- Chapter 1: Apache Solr Configuration
- Introduction
- Running Solr on Jetty
- Running Solr on Apache Tomcat
- Installing a standalone ZooKeeper
- Clustering your data
- Choosing the right directory implementation
- Configuring spellchecker to not use its own index
- Solr cache configuration
- How to fetch and index web pages
- How to set up the extracting request handler
- Changing the default similarity implementation
- Chapter 2: Indexing Your Data
- Introduction
- Indexing PDF files
- Generating unique fields automatically
- Extracting metadata from binary files
- How to properly configure Data Import Handler with JDBC
- Indexing data from a database using Data Import Handler
- How to import data using Data Import Handler and delta query
- How to use Data Import Handler with the URL data source
- How to modify data while importing with Data Import Handler
- Updating a single field of your document
- Handling multiple currencies
- Detecting the document's language
- Optimizing your primary key field indexing
- Chapter 3: Analyzing Your Text Data
- Introduction
- Storing additional information using payloads
- Eliminating XML and HTML tags from text
- Copying the contents of one field to another
- Changing words to other words
- Splitting text by CamelCase
- Splitting text by whitespace only
- Making plural words singular without stemming
- Lowercasing the whole string
- Storing geographical points in the index
- Stemming your data
- Preparing text to perform an efficient trailing wildcard search
- Splitting text by numbers and non-whitespace characters
- Using Hunspell as a stemmer
- Using your own stemming dictionary
- Protecting words from being stemmed
- Chapter 4: Querying Solr
- Introduction
- Asking for a particular field value
- Sorting results by a field value
- How to search for a phrase, not a single word
- Boosting phrases over words
- Positioning some documents over others in a query
- Positioning documents with words closer to each other first
- Sorting results by the distance from a point
- Getting documents with only a partial match
- Affecting scoring with functions
- Nesting queries
- Modifying returned documents
- Using parent-child relationships
- Ignoring typos in terms of performance
- Detecting and omitting duplicate documents
- Using field aliases
- Returning a value of a function in the results
- Chapter 5: Using the Faceting Mechanism
- Introduction
- Getting the number of documents with the same field value
- Getting the number of documents with the same value range
- Getting the number of documents matching the query and subquery
- Removing filters from faceting results
- Sorting faceting results in alphabetical order
- Implementing the autosuggest feature using faceting
- Getting the number of documents that don't have a value in the field
- Having two different facet limits for two different fields in the same query
- Using decision tree faceting
- Calculating faceting for relevant documents in groups
- Chapter 6: Improving Solr Performance
- Introduction
- Paging your results quickly
- Configuring the document cache
- Configuring the query result cache
- Configuring the filter cache
- Improving Solr performance right after the startup or commit operation
- Caching whole result pages
- Improving faceting performance for low cardinality fields
- What to do when Solr slows down during indexing
- Analyzing query performance
- Avoiding filter caching
- Controlling the order of execution of filter queries
- Improving the performance of numerical range queries
- Chapter 7: In the Cloud
- Introduction
- Creating a new SolrCloud cluster
- Setting up two collections inside a single cluster
- Managing your SolrCloud cluster
- Understanding the SolrCloud cluster administration GUI
- Distributed indexing and searching
- Increasing the number of replicas on an already live cluster
- Stopping automatic document distribution among shards
- Chapter 8: Using Additional Solr Functionalities
- Introduction
- Getting more documents similar to those returned in the results list
- Highlighting matched words
- How to highlight long text fields and get good performance
- Sorting results by a function value
- Searching words by how they sound
- Ignoring defined words
- Computing statistics for the search results
- Checking the user's spelling mistakes
- Using field values to group results
- Using queries to group results
- Using function queries to group results
- Chapter 9: Dealing with Problems
- Introduction
- How to deal with too many opened files
- How to deal with out-of-memory problems
- How to sort non-English languages properly
- How to make your index smaller
- Diagnosing Solr problems
- How to avoid swapping
- Appendix: Real-life Situations
- Introduction
- How to implement a product's autocomplete functionality
- How to implement a category's autocomplete functionality
- How to use different query parsers in a single query
- How to get documents right after they were sent for indexation
- How to search your data in a near real-time manner
- How to get the documents with all the query words to the top of the results set
- How to boost documents based on their publishing date
Rafał Kuć
Code Downloads
Download the code and support files for this book.
Submit Errata
Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.
Errata
- 2 submitted: last submission 20 May 2013Errata Type: Code Page no. 214, 215, and 219
In the recipe Setting up two collections inside a single cluster, step 3 there is a lack of a space character between the name of the confname parameter and its value. That is:
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confnamebookscollection
and
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/users/conf -confnameuserscollection
should be
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection
and
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/users/conf -confname userscollection
In the recipe Managing your SolrCloud cluster, step 4, there is a lack of a space character between the name of the confname parameter and its value. That is:
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confnamebookscollection
should be
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection
Errata Type: Code Page no. 10
In Chapter 1,in the Running Solr on Jetty recipe the example showing how to increase the header buffer size is based on Jetty 6. If you are using a newer version of Jetty, such as Jetty 8 instead of headerBufferSize, please use the requestHeaderSize property. So the example will look like this:
<set name="requestHeaderSize">32768</set>
Errata Type: Code; Page No. 28
Chapter 1, Recipe: How to fetch and index web pages
The example describing the schema.xml file should look like the description states, so it should be like this:
<schema name="nutch" version="1.5">
Errata Type: Code; Page No. 43
Chapter 2, Recipe: How to properly configure Data Import Handler with JDBC
In the db-data-config.xml example there is the following code snippet:
<field column="description" name="description" />
It should be:
<field column="desc" name="description" />
Errata Type: Code; Page No. 73
Chapter 3, Recipe: Eliminating XML and HTML tags from text
The value in html field of the example document should be surrounded by CDATA section, just like it is in the code you can download.
The example document should look like this:
<add> <doc> <field name="id">1</field> <field name="html"><![CDATA[<html><head><title>My page</title></head><body><p>This is a <b>my</b> <i>sample</i> page</body></html>]]></field> </doc> </add> </p>
Errata Type: Content; Page No. 90
Chapter 3, Recipe: Storing geographical points in the index There is a sentence missing before the last example. Currently it is "(…) can add data to index:" and it should be "(…) can add data to index. Now let’s look again at the query".
Sample chapters
You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.
- Efficient and configurable Apache Solr 4 setup
- Index your data in different formats, forms, and sources
- Implement different autocomplete functionality
- Achieve near real time search with Apache Solr 4
- Improve and benchmark Apache Solr for increased performance
- Master SolrCloud functionality
- Diagnose and resolve your problems with Apache Solr 4
- Improve the relevance of your queries
- Overcome common problems when analyzing your data
Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.
"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.
"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.
With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.
"Apache Solr 4 Cookbook" is written in a helpful, practical style with numerous hands-on recipes to help you master Apache Solr to get more precise search results and analysis, higher performance, and reliability.
This book is for developers who wish to learn how to master Apache Solr 4. This book will specifically appeal to developers who wish to quickly get to grips with the changes and new features of Apache Solr 4. This book is also handy as a practical guide to solving common problems and issues when using Apache Solr.

