Scaling Big Data with Hadoop and Solr - Second Edition

Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

Scaling Big Data with Hadoop and Solr - Second Edition

Hrishikesh Vijay Karambelkar

1 customer reviews
Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr
Mapt Subscription
FREE
$29.99/m after trial
eBook
$25.20
RRP $35.99
Save 29%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$25.20
$44.99
$29.99p/m after trial
RRP $35.99
RRP $44.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783553396
Paperback166 pages

Book Description

Together, Apache Hadoop and Apache Solr help organizations resolve the problem of information extraction from big data by providing excellent distributed faceted search capabilities.

This book will help you learn everything you need to know to build a distributed enterprise search platform as well as optimize this search to a greater extent, resulting in the maximum utilization of available resources. Starting with the basics of Apache Hadoop and Solr, the book covers advanced topics of optimizing search with some interesting real-world use cases and sample Java code.

This is a step-by-step guide that will teach you how to build a high performance enterprise search while scaling data with Hadoop and Solr in an effortless manner.

Table of Contents

Chapter 1: Processing Big Data Using Hadoop and MapReduce
Apache Hadoop's ecosystem
Configuring Apache Hadoop
Running Hadoop
Setting up a Hadoop cluster
Common problems and their solutions
Summary
Chapter 2: Understanding Apache Solr
Setting up Apache Solr
The Apache Solr architecture
Configuring Solr
Loading data in Apache Solr
Querying for information in Solr
Summary
Chapter 3: Enabling Distributed Search using Apache Solr
Understanding a distributed search
Working with SolrCloud
Sharding algorithm and fault tolerance
Apache Solr and Big Data – integration with MongoDB
Summary
Chapter 4: Big Data Search Using Hadoop and Its Ecosystem
Understanding NoSQL
Working with the Solr HDFS connector
Big data search using Katta
Using Solr 1045 Patch – map-side indexing
Using Solr 1301 Patch – reduce-side indexing
Distributed search using Apache Blur
Apache Solr and Cassandra
Scaling Solr through Storm
Advanced analytics with Solr
Summary
Chapter 5: Scaling Search Performance
Understanding the limits
Optimizing search schema
Index optimization
Optimizing search runtime
Monitoring Solr instance
Summary

What You Will Learn

  • Understand Apache Hadoop, its ecosystem, and Apache Solr
  • Explore industry-based architectures by designing a big data enterprise search with their applicability and benefits
  • Integrate Apache Solr with big data technologies such as Cassandra to enable better scalability and high availability for big data
  • Optimize the performance of your big data search platform with scaling data
  • Write MapReduce tasks to index your data
  • Configure your Hadoop instance to handle real-world big data problems
  • Work with Hadoop and Solr using real-world examples to benefit from their practical usage
  • Use Apache Solr as a NoSQL database

Authors

Table of Contents

Chapter 1: Processing Big Data Using Hadoop and MapReduce
Apache Hadoop's ecosystem
Configuring Apache Hadoop
Running Hadoop
Setting up a Hadoop cluster
Common problems and their solutions
Summary
Chapter 2: Understanding Apache Solr
Setting up Apache Solr
The Apache Solr architecture
Configuring Solr
Loading data in Apache Solr
Querying for information in Solr
Summary
Chapter 3: Enabling Distributed Search using Apache Solr
Understanding a distributed search
Working with SolrCloud
Sharding algorithm and fault tolerance
Apache Solr and Big Data – integration with MongoDB
Summary
Chapter 4: Big Data Search Using Hadoop and Its Ecosystem
Understanding NoSQL
Working with the Solr HDFS connector
Big data search using Katta
Using Solr 1045 Patch – map-side indexing
Using Solr 1301 Patch – reduce-side indexing
Distributed search using Apache Blur
Apache Solr and Cassandra
Scaling Solr through Storm
Advanced analytics with Solr
Summary
Chapter 5: Scaling Search Performance
Understanding the limits
Optimizing search schema
Index optimization
Optimizing search runtime
Monitoring Solr instance
Summary

Book Details

ISBN 139781783553396
Paperback166 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Mastering Web Application Development with AngularJS Book Cover
Mastering Web Application Development with AngularJS
$ 26.99
$ 5.40
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10