Solr 1.4 Enterprise Search Server


Solr 1.4 Enterprise Search Server
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • Deploy, embed, and integrate Solr with a host of programming languages
  • Implement faceting in e-commerce and other sites to summarize and navigate the results of a text search
  • Enhance your search by highlighting search results, offering spell-corrections, auto-suggest, finding “similar” records, boosting records and fields for scoring, phonetic matching
  • Informative and practical approach to development with fully working examples of integrating a variety of technologies
  • Written and tested for Solr 1.4 pre-release 2009.08

Book Details

Language : English
Paperback : 336 pages [ 235mm x 191mm ]
Release Date : August 2009
ISBN : 1847195881
ISBN 13 : 9781847195883
Author(s) : David Smiley, Eric Pugh
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Back to BOOK PAGE

Table of Contents

Preface
Chapter 1: Quick Starting Solr
Chapter 2: Schema and Text Analysis
Chapter 3: Indexing Data
Chapter 4: Basic Searching
Chapter 5: Enhanced Searching
Chapter 6: Search Components
Chapter 7: Deployment
Chapter 8: Integrating Solr
Chapter 9: Scaling Solr
Index
  • Chapter 1: Quick Starting Solr
    • An introduction to Solr
      • Lucene, the underlying engine
      • Solr, the Server-ization of Lucene
    • Comparison to database technology
    • Getting started
      • The last official release or fresh code from source control
      • Testing and building Solr
      • Solr's installation directory structure
      • Solr's home directory
      • How Solr finds its home
      • Deploying and running Solr
    • A quick tour of Solr!
      • Loading sample data
      • A simple query
      • Some statistics
    • The schema and configuration files
    • Solr resources outside this book
    • Summary
  • Chapter 2: Schema and Text Analysis
    • MusicBrainz.org
    • One combined index or multiple indices
      • Problems with using a single combined index
    • Schema design
      • Step 1: Determine which searches are going to be powered by Solr
      • Step 2: Determine the entities returned from each search
      • Step 3: Denormalize related data
        • Denormalizing—"one-to-one" associated data
        • Denormalizing—"one-to-many" associated data
      • Step 4: (Optional) Omit the inclusion of fields only used in search results
    • The schema.xml file
      • Field types
      • Field options
      • Field definitions
        • Sorting
        • Dynamic fields
        • Using copyField
        • Remaining schema.xml settings
    • Text analysis
      • Configuration
      • Experimenting with text analysis
      • Tokenization
      • WorkDelimiterFilterFactory
      • Stemming
      • Synonyms
        • Index-time versus Query-time, and to expand or not
      • Stop words
      • Phonetic sounds-like analysis
      • Partial/Substring indexing
        • N-gramming costs
      • Miscellaneous analyzers
    • Summary
  • Chapter 3: Indexing Data
    • Communicating with Solr
      • Direct HTTP or a convenient client API
      • Data streamed remotely or from Solr's filesystem
      • Data formats
    • Using curl to interact with Solr
    • Remote streaming
    • Sending XML to Solr
      • Deleting documents
      • Commit, optimize, and rollback
    • Sending CSV to Solr
      • Configuration options
    • Direct database and XML import
      • Getting started with DIH
        • The DIH development console
        • DIH documents, entities
        • DIH fields and transformers
      • Importing with DIH
    • Indexing documents with Solr Cell
      • Extracting binary content
      • Configuring Solr
      • Extracting karaoke lyrics
      • Indexing richer documents
    • Summary
  • Chapter 4: Basic Searching
    • Your first search, a walk-through
    • Solr's generic XML structured data representation
    • Solr's XML response format
      • Parsing the URL
    • Query parameters
      • Parameters affecting the query
      • Result paging
      • Output related parameters
      • Diagnostic query parameters
    • Query syntax
      • Matching all the documents
      • Mandatory, prohibited, and optional clauses
        • Boolean operators
      • Sub-expressions (aka sub-queries)
        • Limitations of prohibited clauses in sub-expressions
      • Field qualifier
      • Phrase queries and term proximity
      • Wildcard queries
        • Fuzzy queries
      • Range queries
        • Date math
      • Score boosting
      • Existence (and non-existence) queries
      • Escaping special characters
    • Filtering
    • Sorting
    • Request handlers
    • Scoring
      • Query-time and index-time boosting
      • Troubleshooting scoring
    • Summary
  • Chapter 5: Enhanced Searching
    • Function queries
      • An example: Scores influenced by a lookupcount
      • Field references
      • Function reference
        • Mathematical primitives
        • Miscellaneous math
        • ord and rord
      • An example with scale() and lookupcount
        • Using logarithms
        • Using inverse reciprocals
        • Using reciprocals and rord with dates
      • Function query tips
    • Dismax Solr request handler
      • Lucene's DisjunctionMaxQuery
        • Configuring queried fields and boosts
      • Limited query syntax
      • Boosting: Automatic phrase boosting
        • Configuring automatic phrase boosting
        • Phrase slop configuration
      • Boosting: Boost queries
      • Boosting: Boost functions
      • Min-should-match
        • Basic rules
        • Multiple rules
        • What to choose
      • A default search
    • Faceting
      • A quick example: Faceting release types
        • MusicBrainz schema changes
      • Field requirements
      • Types of faceting
      • Faceting text
      • Alphabetic range bucketing (A-C, D-F, and so on)
      • Faceting dates
        • Date facet parameters
      • Faceting on arbitrary queries
      • Excluding filters
        • The solution: Local Params
      • Facet prefixing (term suggest)
    • Summary
  • Chapter 6: Search Components
    • About components
    • The highlighting component
      • A highlighting example
      • Highlighting configuration
    • Query elevation
      • Configuration
    • Spell checking
      • Schema configuration
      • Configuration in solrconfig.xml
        • Configuring spellcheckers (dictionaries)
        • Processing of the q parameter
        • Processing of the spellcheck.q parameter
      • Building the dictionary from its source
      • Issuing spellcheck requests
      • Example usage for a mispelled query
        • An alternative approach
    • The more-like-this search component
      • Configuration parameters
        • Parameters specific to the MLT search component
        • Parameters specific to the MLT request handler
        • Common MLT parameters
      • MLT results example
    • Stats component
      • Configuring the stats component
      • Statistics on track durations
    • Field collapsing
      • Configuring field collapsing
    • Other components
      • Terms component
      • termVector component
      • LocalSolr component
    • Summary
  • Chapter 7: Deployment
    • Implementation methodology
      • Questions to ask
    • Installing into a Servlet container
      • Differences between Servlet containers
        • Defining solr.home property
    • Logging
      • HTTP server request access logs
      • Solr application logging
        • Configuring logging output
        • Logging to Log4j
        • Jetty startup integration
        • Managing log levels at runtime
    • A SearchHandler per search interface
    • Solr cores
      • Configuring solr.xml
      • Managing cores
      • Why use multicore
    • JMX
      • Starting Solr with JMX
        • Take a walk on the wild side! Use JRuby to extract JMX information
    • Securing Solr
      • Limiting server access
        • Controlling JMX access
      • Securing index data
        • Controlling document access
        • Other things to look at
    • Summary
  • Chapter 8: Integrating Solr
    • Structure of included examples
      • Inventory of examples
    • SolrJ: Simple Java interface
      • Using Heritrix to download artist pages
      • Indexing HTML in Solr
      • SolrJ client API
        • Indexing POJOs
      • When should I use Embedded Solr
        • In-Process streaming
        • Rich clients
        • Upgrading from legacy Lucene
    • Using JavaScript to integrate Solr
      • Wait, what about security?
      • Building a Solr powered artists autocomplete widget with jQuery and JSONP
      • SolrJS: JavaScript interface to Solr
    • Accessing Solr from PHP applications
      • solr-php-client
      • Drupal options
        • Apache Solr Search integration module
        • Hosted Solr by Acquia
    • Ruby on Rails integrations
      • acts_as_solr
        • Setting up MyFaves project
        • Populating MyFaves relational database from Solr
        • Build Solr indexes from relational database
        • Complete MyFaves web site
      • Blacklight OPAC
        • Indexing MusicBrainz data
      • Customizing display
      • solr-ruby versus rsolr
    • Summary
  • Chapter 9: Scaling Solr
    • Tuning complex systems
      • Using Amazon EC2 to practice tuning
        • Firing up Solr on Amazon EC2
    • Optimizing a single Solr server (Scale High)
      • JVM configuration
      • HTTP caching
      • Solr caching
        • Tuning caches
      • Schema design considerations
      • Indexing strategies
        • Disable unique document checking
        • Commit/optimize factors
      • Enhancing faceting performance
      • Using term vectors
      • Improving phrase search performance
        • The solution: Shingling
    • Moving to multiple Solr servers (Scale Wide)
      • Script versus Java replication
      • Starting multiple Solr servers
        • Configuring replication
      • Distributing searches across slaves
        • Indexing into the master server
        • Configuring slaves
      • Distributing search queries across slaves
      • Sharding indexes
        • Assigning documents to shards
        • Searching across shards
    • Combining replication and sharding (Scale Deep)
    • Summary
Back to BOOK PAGE

David Smiley

Born to code, David Smiley is a software engineer that’s passionate about search, Lucene, spatial, and open-source. He has a great deal of expertise with Lucene and Solr, which started in 2008 at MITRE. In 2009 as the lead author, he wrote Solr 1.4 Enterprise Search Server, the first book on Solr, published by PACKT. It was updated in 2011, and again for this third edition. After the first book, he developed a one and two-day Solr training courses delivered a half dozen times within MITRE, and he delivered LucidWorks’ training once too. Most of his excitement and energy relating to Lucene is centered on Lucene’s spatial module to include Spatial4j, which he is largely responsible for. He presented his progress on this at Lucene Revolution and other conferences several times. Finally, he currently holds committer / Project Management Committee (PMC) status with the Lucene/Solr open-source project. During all this time, David has staked his career on search, working exclusively on such projects, formerly for MITRE, and now as an independent consultant for various clients. You can reach him at dsmiley@apache.org.

Eric Pugh

Fascinated by the “craft” of software development, Eric Pugh has been involved in the open source world as a developer, committer, and user for the past decade. He is a emeritus member of the Apache Software Foundation. In biotech, financial services and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker he has advocated the advantages of Agile practices in search/discovery/analytics projects. Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! The patch was subsequently cleaned up and enhanced by three other individuals, demonstrating the power of the Free/Open Source Model to build great code collaboratively. SOLR-284 was eventually refactored into Solr Cell. He blogs at http://www.opensourceconnections.com/blog/.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


Errata

- 3 submitted: last submission 14 Feb 2014

Errata type: Code | Page number: 25 | Errata date: 13 May 11

take a peak at the request handlers
should be:
take a peek at the request handlers

 

Errata type: Code | Page number: 151 |

Note that after URL encoding, + becomes %3B
should be:
Note that after URL encoding, + becomes %2B

 

Errata type: Typo | Page number: 53 |

The headline must be WordDelimiterFilterFactory instead of WorkDeliminiterFilterFactory

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Solr 1.4 Enterprise Search Server +    SOA Patterns with BizTalk Server 2009 =
50% Off
the second eBook
Price for both: $44.10

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Blend structured data with real search features
  • Import CSV formatted data, XML, common document formats, and from databases
  • Deploy Solr and provide reference to Solr's query syntax from the basics to range queries
  • Enhance search results with spell-checking, auto-completing queries, highlighting search results, and more.
  • Secure Solr
  • Integrate a host of technologies with Solr from the server side to client-side JavaScript, to frameworks like Drupal
  • Scale Solr using replication, distributed searches, and tuning

In Detail

If you are a developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience.

This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks.

This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr's rich query syntax, "boosting" match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on.

After this thorough tour, we'll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python.

Finally, we'll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site.

A practical reference guide to all features in Solr with complete guidance on how to use this incredibly powerful tool effectively

Approach

The book takes a tutorial approach with fully working examples. It will show you how to implement a Solr-based search engine on your intranet or web site.

Who this book is for

This book is for developers who would like to use Solr for their applications. You only need to have basic programming skills to use Solr. Knowledge of Lucene is certainly a bonus.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software