Scaling Apache Solr


Scaling Apache Solr
eBook: $19.99
Formats: PDF, PacktLib, ePub and Mobi formats
$15.99
save 20%!
Print + free eBook + free PacktLib access to the book: $51.98    Print cover: $31.99
$31.99
save 38%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Get an introduction to the basics of Apache Solr in a step-by-step manner with lots of examples
  • Develop and understand the workings of enterprise search solution using various techniques and real-life use cases
  • Gain a practical insight into the advanced ways of optimizing and making an enterprise search solution cloud ready

Book Details

Language : English
Paperback : 298 pages [ 235mm x 191mm ]
Release Date : July 2014
ISBN : 1783981741
ISBN 13 : 9781783981748
Author(s) : Hrishikesh Vijay Karambelkar
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source


Table of Contents

Preface
Chapter 1: Understanding Apache Solr
Chapter 2: Getting Started with Apache Solr
Chapter 3: Analyzing Data with Apache Solr
Chapter 4: Designing Enterprise Search
Chapter 5: Integrating Apache Solr
Chapter 6: Distributed Search Using Apache Solr
Chapter 7: Scaling Solr through Sharding, Fault Tolerance, and Integration
Chapter 8: Scaling Solr through High Performance
Chapter 9: Solr and Cloud Computing
Chapter 10: Scaling Solr Capabilities with Big Data
Appendix: Sample Configuration for Apache Solr
Index
  • Chapter 1: Understanding Apache Solr
    • Challenges in enterprise search
    • Apache Solr – An overview
    • Features of Apache Solr
      • Solr for end users
        • Powerful full text search
        • Search through rich information
        • Results ranking, pagination, and sorting
        • Facets for better browsing experience
        • Advanced search capabilities
      • Administration
    • Apache Solr architecture
      • Storage
      • Solr application
      • Integration
        • Client APIs and SolrJ client
        • Other interfaces
    • Practical use cases for Apache Solr
      • Enterprise search for a job search agency
        • Problem statement
        • Approach
      • Enterprise search for energy industry
        • Problem statement
        • Approach
    • Summary
  • Chapter 2: Getting Started with Apache Solr
    • Setting up Apache Solr
      • Prerequisites
      • Running Solr on Jetty
      • Running Solr on Tomcat
      • Solr administration
      • What's next?
      • Common problems and solution
    • Understanding the Solr structure
      • The Solr home directory structure
      • Solr navigation
    • Configuring the Apache Solr for enterprise
      • Defining a Solr schema
        • Solr fields
        • Dynamic Fields in Solr
        • Copying the fields
        • Field types
        • Other important elements in the Solr schema
      • Configuring Solr parameters
        • solr.xml and Solr core
        • solrconfig.xml
        • The Solr plugin
      • Other configurations
    • Understanding SolrJ
    • Summary
  • Chapter 3: Analyzing Data with Apache Solr
    • Understanding enterprise data
      • Categorizing by characteristics
      • Categorizing by access pattern
      • Categorizing by data formats
    • Loading data using native handlers
      • Quick and simple data loading – post tool
      • Working with JSON, XML, and CSV
        • Handling JSON data
        • Working with CSV data
        • Working with XML data
    • Working with rich documents
      • Understanding Apache Tika
      • Using Solr Cell (ExtractingRequestHandler)
      • Adding metadata to your rich documents
    • Importing structured data from the database
      • Configuring the data source
      • Importing data in Solr
        • Full import
        • Delta import
      • Loading RDBMS tables in Solr
    • Advanced topics with Solr
      • Deduplication
      • Extracting information from scanned documents
      • Searching through images using LIRE
    • Summary
  • Chapter 4: Designing Enterprise Search
    • Designing aspects for enterprise search
      • Identifying requirements
      • Matching user expectations through relevance
      • Access to searched entities and user interface
      • Improving search performance and ensuring instance scalability
      • Working with applications through federated search
      • Other differentiators – mobiles, linguistic search, and security
    • Enterprise search data-processing patterns
      • Standalone search engine server
      • Distributed enterprise search pattern
      • The replicated enterprise search pattern
      • Distributed and replicated
    • Data integrating pattern for search
      • Data import by enterprise search
      • Applications pushing data
      • Middleware-based integration
    • Case study – designing an enterprise knowledge repository search for software IT services
      • Gathering requirements
      • Designing the solution
        • Designing the schema
        • Integrating subsystems with Apache Solr
        • Working on end user interface
    • Summary
  • Chapter 5: Integrating Apache Solr
    • Empowering the Java Enterprise application with Solr search
      • Embedding Apache Solr as a module (web application) in an enterprise application
        • How to do it?
      • Apache Solr in your web application
        • How to do it?
    • Integration with client technologies
      • Integrating Apache Solr with PHP for web portals
        • Interacting directly with Solr
        • Using the Solr PHP client
        • Advanced integration with Solarium
      • Integrating Apache Solr with JavaScript
        • Using simple XMLHTTPRequest
        • Integrating Apache Solr using AJAX Solr
      • Parsing Solr XML with the help of XSLT
    • Case study – Apache Solr and Drupal
      • How to do it?
    • Summary
  • Chapter 6: Distributed Search Using Apache Solr
    • Need for distributed search
      • Distributed search architecture
      • Apache Solr and distributed search
    • Understanding SolrCloud
      • Why Zookeeper?
      • SolrCloud architecture
    • Building enterprise distributed search using SolrCloud
      • Setting up a SolrCloud for development
      • Setting up a SolrCloud for production
      • Adding a document to SolrCloud
      • Creating shards, collections, and replicas in SolrCloud
    • Common problems and resolutions
    • Case study – distributed enterprise search server for the software industry
    • Summary
  • Chapter 7: Scaling Solr through Sharding, Fault Tolerance, and Integration
    • Enabling search result clustering with Carrot2
      • Why Carrot2?
      • Enabling Carrot2-based document clustering
      • Understanding Carrot2 result clustering
      • Viewing Solr results in the Carrot2 workbench
      • FAQs and problems
    • Sharding and fault tolerance
      • Document routing and sharding
      • Shard splitting
      • Load balancing and fault tolerance in SolrCloud
    • Searching Solr documents in near real time
      • Strategies for near real-time search in Apache Solr
        • Explicit call to commit from a client
        • solrconfig.xml – autocommit
        • CommitWithin – delegating the responsibility to Solr
        • Real-time search in Apache Solr
    • Solr with MongoDB
      • Understanding MongoDB
      • Installing MongoDB
      • Creating Solr indexes from MongoDB
    • Scaling Solr through Storm
      • Getting along with Apache Storm
      • Solr and Apache Storm
    • Summary
  • Chapter 8: Scaling Solr through High Performance
    • Monitoring performance of Apache Solr
      • What should be monitored?
        • Hardware and operating system
        • Java virtual machine
        • Apache Solr search runtime
        • Apache Solr indexing time
        • SolrCloud
      • Tools for monitoring Solr performance
        • Solr administration user interface
        • JConsole
        • SolrMeter
    • Tuning Solr JVM and container
      • Deciding heap size
      • How can we optimize JVM?
      • Optimizing JVM container
    • Optimizing Solr schema and indexing
      • Stored fields
      • Indexed fields and field lengths
      • Copy fields and dynamic fields
      • Fields for range queries
      • Index field updates
      • Synonyms, stemming, and stopwords
      • Tuning DataImportHandler
      • Speeding up index generation
      • Committing the change
        • Limiting indexing buffer size
      • SolrJ implementation classes
    • Speeding Solr through Solr caching
      • The filter cache
      • The query result cache
      • The document cache
      • The field value cache
      • The warming up cache
    • Improving runtime search for Solr
      • Pagination
      • Reducing Solr response footprint
      • Using filter queries
      • Search query and the parsers
      • Lazy field loading
    • Optimizing SolrCloud
    • Summary
  • Chapter 9: Solr and Cloud Computing
    • Enterprise search on Cloud
      • Models of engagement
      • Enterprise search Cloud deployment models
    • Solr on Cloud strategies
      • Scaling Solr with a dedicated application
        • Advantages
        • Disadvantages
      • Scaling Solr horizontal as multiple applications
        • Advantages
        • Disadvantages
      • Scaling horizontally through the Solr multicore
        • Scaling horizontally with replication
        • Scaling horizontally with Zookeeper
    • Running Solr on Cloud (IaaS and PaaS)
      • Running Solr with Amazon Cloud
      • Running Solr on Windows Azure
    • Running Solr on Cloud (SaaS) and enterprise search as a service
      • Running Solr with OpenSolr Cloud
      • Running Solr with SolrHQ Cloud
      • Running Solr with Bitnami
      • Working with Amazon CloudSearch
      • Drupal-Solr SaaS with Acquia
    • Summary
  • Chapter 10: Scaling Solr Capabilities with Big Data
    • Apache Solr and HDFS
    • Big Data search on Katta
      • How Katta works?
      • Setting up Katta cluster
      • Creating Katta indexes
    • Using the Solr 1045 patch – map-side indexing
    • Using the Solr 1301 patch – reduce-side indexing
    • Apache Solr and Cassandra
      • Working with Cassandra and Solr
        • Single node configuration
        • Integrating with multinode Cassandra
    • Advanced analytics with Solr
      • Integrating Solr and R
    • Summary

Hrishikesh Vijay Karambelkar

Hrishikesh Vijay Karambelkar is an enterprise architect with a blend of technical and entrepreneurial experience of more than 13 years. His core expertise involves working with multiple topics that include J2EE, Big Data, Solr, Link Analysis, and Analytics. He enjoys architecting solutions for the next generation of product development for IT organizations. He spends most of his work time now solving challenging problems faced by the software industry.

In the past, he has worked in the domains of graph databases; some of his work has been published in international conferences, such as VLDB, ICDE, and so on. He has recently published a book called Scaling Big Data with Hadoop and Solr, Packt Publishing. He enjoys spending his leisure time traveling, trekking, and photographing birds in the dense forests of India. He can be reached at http://hrishikesh.karambelkar.co.in/.

Sorry, we don't have any reviews for this title yet.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Scaling Apache Solr +    IBM Rational Team Concert 2 Essentials =
50% Off
the second eBook
Price for both: $41.55

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • „Gain a complete understanding of Apache Solr and its ecosystem
  • Develop scalable, high-performance search applications using Apache Solr
  • Customize Apache-Solr-based search for different requirements
  • Discover different techniques to build high-speed enterprise searches
  • Design enterprise-ready search engines and implement a scalable enterprise search functionality
  • Integrate an Apache-Solr-based search with different subsystems and legacy systems
  • Scale Apache Solr through sharding, replication, and fault tolerance
  • Learn about performance tuning for your Solr-based application while scaling your data
  • Make your enterprise search cloud-ready to be able to work with multiple clients

In Detail

This book is for individuals who want to build high-performance, scalable, enterprise-ready search engines for their customers/organizations. The book starts with the basics of Apache Solr, covering different ways to analyze enterprise information and design enterprise-ready search engines using Solr. It also discusses scaling Solr-based enterprise search for the next level.

Each chapter takes you through more advanced levels of Apache Solr with real-world practical details such as configuring instances, installing and setting up instances, and more. This book contains detailed explanations of the basic and advanced features of Apache Solr.

By sequentially working through the steps in each chapter and with the help of real-life industry examples, you will quickly master the features of Apache Solr to build search solutions for enterprises.

Approach

This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies.

Who this book is for

If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software