Apache Solr 3.1 Cookbook


Apache Solr 3.1 Cookbook
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Improve the way in which you work with Apache Solr to make your search engine quicker and more effective
  • Deal with performance, setup, and configuration problems in no time
  • Discover little-known Solr functionalities and create your own modules to customize Solr to your company's needs
  • Part of Packt's Cookbook series; each chapter covers a different aspect of working with Solr

Book Details

Language : English
Paperback : 300 pages [ 235mm x 191mm ]
Release Date : July 2011
ISBN : 1849512183
ISBN 13 : 9781849512183
Author(s) : Rafał Kuć
Topics and Technologies : All Books, Big Data and Business Intelligence, Cookbooks, Open Source, Web Development

Table of Contents

Preface
Chapter 1: Apache Solr Configuration
Chapter 2: Indexing your Data
Chapter 3: Analyzing your Text Data
Chapter 4: Solr Administration
Chapter 5: Querying Solr
Chapter 6: Using Faceting Mechanism
Chapter 7: Improving Solr Performance
Chapter 8: Creating Applications that use Solr and Developing your Own Solr Modules
Chapter 9: Using Additional Solr Functionalities
Chapter 10: Dealing with Problems
Index
  • Chapter 1: Apache Solr Configuration
    • Introduction
    • Running Solr on Jetty
    • Running Solr on Apache Tomcat
    • Using the Suggester component
    • Handling multiple languages in a single index
    • Indexing fields in a dynamic way
    • Making multilingual data searchable with multicore deployment
    • Solr cache configuration
    • How to fetch and index web pages
    • Getting the most relevant results with early query termination
    • How to set up Extracting Request Handler
    • Chapter 2: Indexing your Data
      • Introduction
      • Indexing data in CSV format
      • Indexing data in XML format
      • Indexing data in JSON format
      • Indexing PDF files
      • Indexing Microsoft Office files
      • Extracting metadata from binary files
      • How to properly configure Data Import Handler with JDBC
      • Indexing data from a database using Data Import Handler
      • How to import data using Data Import Handler and delta query
      • How to use Data Import Handler with URL Data Source
      • How to modify data while importing with Data Import Handler
      • Chapter 3: Analyzing your Text Data
        • Introduction
        • Storing additional information using payloads
        • Eliminating XML and HTML tags from the text
        • Copying the contents of one field to another
        • Changing words to other words
        • Splitting text by camel case
        • Splitting text by whitespace only
        • Making plural words singular, but without stemming
        • Lowercasing the whole string
        • Storing geographical points in the index
        • Stemming your data
        • Preparing text to do efficient trailing wildcard search
        • Splitting text by numbers and non-white space characters
        • Chapter 4: Solr Administration
          • Introduction
          • Monitoring Solr via JMX
          • How to check the cache status
          • How to check how the data type or field behave
          • How to check Solr query handler usage
          • How to check Solr update handler usage
          • How to change Solr instance logging configuration
          • How to check the Java based replication status
          • How to check the script based replication status
          • Setting up a Java based index replication
          • Setting up script based replication
          • How to manage Java based replication status using HTTP commands
          • How to analyze your index structure
          • Chapter 5: Querying Solr
            • Introduction
            • Asking for a particular field value
            • Sorting results by a field value
            • Choosing a different query parser
            • How to search for a phrase, not a single word
            • Boosting phrases over words
            • Positioning some documents over others on a query
            • Positioning documents with words closer to each other first
            • Sorting results by a distance from a point
            • Getting documents with only a partial match
            • Affecting scoring with function
            • Nesting queries
            • Chapter 6: Using Faceting Mechanism
              • Introduction
              • Getting the number of documents with the same field value
              • Getting the number of documents with the same date range
              • Getting the number of documents with the same value range
              • Getting the number of documents matching the query and sub query
              • How to remove filters from faceting results
              • How to name different faceting results
              • How to sort faceting results in an alphabetical order
              • How to implement the autosuggest feature using faceting
              • How to get the number of documents that don't have a value in the field
              • How to get all the faceting results, not just the first hundred ones
              • How to have two different facet limits for two different fields in the same query
              • Chapter 7: Improving Solr Performance
                • Introduction
                • Paging your results quickly
                • Configuring the document cache
                • Configuring the query result cache
                • Configuring the filter cache
                • Improving Solr performance right after the startup or commit operation
                • Setting up a sharded deployment
                • Caching whole result pages
                • Improving faceting performance
                • What to do when Solr slows down during indexing when using Data Import Handler
                • Getting the first top documents fast when having millions of them
                  • Chapter 9: Using Additional Solr Functionalities
                    • Introduction
                    • Getting more documents similar to those returned in the results list
                    • Presenting search results in a fast and easy way
                    • Highlighting matched words
                    • How to highlight long text fields and get good performance
                    • Sorting results by a function value
                    • Searching words by how they sound
                    • Ignoring defined words
                    • Computing statistics for the search results
                    • Checking user's spelling mistakes
                    • Using "group by" like functionalities in Solr
                    • Chapter 10: Dealing with Problems
                      • Introduction
                      • How to deal with a corrupted index
                      • How to reduce the number of files the index is made of
                      • How to deal with a locked index
                      • How to deal with too many opened files
                      • How to deal with out of memory problems
                      • How to sort non-English languages properly
                      • How to deal with the infinite loop exception when using shards
                      • How to deal with garbage collection running too long
                      • How to update only one field in all documents without the need of full indexation
                      • How to make your index smaller

                      Rafał Kuć

                      Rafał Kuć is a born team leader and software developer. He currently works as a consultant and a software engineer at Sematext Group, Inc., where he concentrates on open source technologies such as Apache Lucene and Solr, Elasticsearch, and Hadoop stack. He has more than 12 years of experience in various branches of software, from banking software to e-commerce products. He focuses mainly on Java but is open to every tool and programming language that will make the achievement of his goal easier and faster. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people with the problems they face with Solr and Lucene. Also, he has been a speaker at various conferences around the world, such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, and Lucene Revolution.

                      Rafał began his journey with Lucene in 2002, and it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then, Solr came along and this was it. He started working with Elasticsearch in the middle of 2010. Currently, Lucene, Solr, Elasticsearch, and information retrieval are his main points of interest.

                      Rafał is also the author of Apache Solr 3.1 Cookbook, and the update to it, Apache Solr 4 Cookbook. Also, he is the author of the previous edition of this book and Mastering ElasticSearch. All these books have been published by Packt Publishing.

                      Sorry, we don't have any reviews for this title yet.

                      Code Downloads

                      Download the code and support files for this book.


                      Submit Errata

                      Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


                      Errata

                      - 2 submitted: last submission 09 Nov 2012

                      Errata type: Code | Page numbers: 28

                      Now add the file named nutch to the directory $$NUTCH_HOME/crawl/nutch/site.
                      should be :
                      Now add the file named site to the directory $NUTCH_HOME/crawl/nutch/.

                       

                      Errata type: Code | Page numbers: 29

                      The file site we created in the directory $$NUTCH_HOME/crawl/nutch should contain information about the sites from which we want information to be fetched.
                      should be:
                      The file site we created in the directory $NUTCH_HOME/crawl/nutch/ should contain information about the sites from which we want information to be fetched.

                       

                      Sample chapters

                      You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                      Frequently bought together

                      Apache Solr 3.1 Cookbook +    Flash Development for Android Cookbook =
                      50% Off
                      the second eBook
                      Price for both: £21.80

                      Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                      What you will learn from this book

                      • Index data in different formats and forms
                      • Use the Solr administration panel to discover the most commonly searched for information
                      • Learn how to use different data grouping techniques
                      • Improve your Solr deployment performance
                      • Create and use your own Apache Solr modules
                      • Configure your cache to cater for changes in your data
                      • Import data using the Data Import Handler and delta query
                      • Query Solr to search for phrases, sort results by different fields, and search geographical points
                      • Create new applications that use Solr
                      • Reduce the size of your index for faster searching

                      In Detail

                      Apache Solr is a fast, scalable, modern, open source, and easy-to-use search engine. It allows you to develop a professional search engine for your ecommerce site, web application, or back office software. Setting up Solr is easy, but configuring it to get the most out of your site is the difficult bit.

                      The Solr 3.1 Cookbook will make your everyday work easier by using real-life examples that show you how to deal with the most common problems that can arise while using the Apache Solr search engine. Why waste your time searching the Internet for solutions when you can have all the answers in one place?

                      This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. You will learn about functionalities that most newbies are unaware of, such as sorting results by a function value, highlighting matched words, and computing statistics to make your work with Solr easy and stress free.

                      This practical guide shows you how to get the most out of Apache Solr 3.1 with recipes that show you how to improve your search engine's performance, analyze data quickly and efficiently, and customize the search server with your own modules.

                      Approach

                      This book is part of Packt's Cookbook series; each chapter looks at a different aspect of working with Apache Solr. The recipes deal with common problems of working with Solr by using easy-to-understand, real-life examples. The book is not in any way a complete Apache Solr reference and you should see it as a helping hand when things get rough on your journey with Apache Solr.

                      Who this book is for

                      Developers who are working with Apache Solr and would like to know how to combat common problems will find this book of great use. Knowledge of Apache Lucene would be a bonus but is not required.

                      Code Download and Errata
                      Packt Anytime, Anywhere
                      Register Books
                      Print Upgrades
                      eBook Downloads
                      Video Support
                      Contact Us
                      Awards Voting Nominations Previous Winners
                      Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                      Resources
                      Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software