Packt+ | Advance your knowledge in tech

You're reading from Scaling Big Data with Hadoop and Solr, Second Edition

Product type Book

Published in Apr 2015

Publisher

ISBN-13 9781783553396

Pages 166 pages

Edition 1st Edition

Languages

Concepts

Big Data

Author (1):

Hrishikesh Vijay Karambelkar

Table of Contents (13) Chapters

Scaling Big Data with Hadoop and Solr Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Processing Big Data Using Hadoop and MapReduce

2. Understanding Apache Solr

3. Enabling Distributed Search using Apache Solr

4. Big Data Search Using Hadoop and Its Ecosystem

5. Scaling Search Performance

Use Cases for Big Data Search

Index

A

advanced analytics, with Solr
- about / Advanced analytics with Solr
Analyzer
- about / The Apache Solr architecture
ant build scripting
- URL / Single node configuration
Apache Ambari
- about / Understanding Hadoop's ecosystem
Apache Blur
- about / Distributed search using Apache Blur, High-level design
- working, with Hadoop / Distributed search using Apache Blur
- setting up, with Hadoop / Setting up Apache Blur with Hadoop
- URL / Setting up Apache Blur with Hadoop
Apache Cassandra
- URL / Integrating with multinode Cassandra
Apache Chukwa
- about / Understanding Hadoop's ecosystem
Apache Flume
- about / Understanding Hadoop's ecosystem
Apache Hadoop
- about / Apache Hadoop's ecosystem, Understanding Hadoop's ecosystem
- ecosystem / Apache Hadoop's ecosystem, Understanding Hadoop's ecosystem
- HDFS / Apache Hadoop's ecosystem
- MapReduce / Apache Hadoop's ecosystem
- core components / Core components
- configuring / Configuring Apache Hadoop, Configuring Hadoop
- single node setup / Configuring Apache Hadoop
- pseudo distributed setup / Configuring Apache Hadoop
- fully distributed setup / Configuring Apache Hadoop
- pre-requisites / Prerequisites
- download link / Prerequisites
- ssh, setting up without passphrase / Setting up ssh without passphrase
- running / Running Hadoop
- common problems and solutions / Common problems and their solutions
- URL / Working with the Solr HDFS connector
Apache HBase
- about / Understanding Hadoop's ecosystem
Apache HCatalog
- about / Understanding Hadoop's ecosystem
Apache Hive
- about / Understanding Hadoop's ecosystem
Apache Ivy
- URL / Single node configuration
Apache JIRA site
- URL / Using Solr 1045 Patch – map-side indexing
Apache Kafka
- about / High-level design
- URL / High-level design
Apache Lucene
- about / Understanding the limits
Apache Lucene core
- about / The Apache Solr architecture
Apache Mahout
- about / Understanding Hadoop's ecosystem
Apache Oozie
- about / Understanding Hadoop's ecosystem
Apache Pig
- about / Understanding Hadoop's ecosystem
Apache Solr
- setting up / Setting up Apache Solr
- prerequisites / Prerequisites for setting up Apache Solr
- running, on jetty / Running Apache Solr on jetty
- running, on J2EE containers / Running Solr on other J2EE containers
- Hello World / Hello World with Apache Solr!
- common problems and solutions / Common problems and solutions
- architecture / The Apache Solr architecture
- configuring / Configuring Solr
- data, loading / Loading data in Apache Solr
- information, querying for / Querying for information in Solr
- distributed search, enabling with / Apache Solr and distributed search
- index partitioning / The SolrCloud architecture
- download link / Setting up SolrCloud for development
- working with, Cassandra / Apache Solr and Cassandra
Apache Solr and MongoDB integration
- about / Apache Solr and Big Data – integration with MongoDB
- NOSQL / What is NoSQL and how is it related to Big Data?
- MongoDB / MongoDB at glance
- MongoDB, installing / Installing MongoDB
- Solr indexes, creating from MongoDB / Creating Solr indexes from MongoDB
Apache Sqoop
- about / Understanding Hadoop's ecosystem
Apache Storm
- about / Scaling Solr through Storm
- Solr, scaling through / Scaling Solr through Storm
- URL / Scaling Solr through Storm
- master node / Scaling Solr through Storm
- worker node / Scaling Solr through Storm
- slave node / Scaling Solr through Storm
- installing / Getting along with Apache Storm
- download link / Getting along with Apache Storm
Apache Tika
- about / The Apache Solr architecture, Working with rich documents (Apache Tika)
Apache Zookeeper
- about / Understanding Hadoop's ecosystem
- URL / Setting up SolrCloud for production
Application Master (AM)
- about / Core components
architecture, Solr
- about / The Apache Solr architecture
- index replicator / The Apache Solr architecture
architecture, SolrCloud
- about / The SolrCloud architecture
availability
- about / Understanding NoSQL

B

big data search
- Katta used / Big data search using Katta

C

Cache Autowarming / Optimizing the Solr cache
CAP theorem
- URL / Apache Solr and Big Data – integration with MongoDB, Understanding NoSQL
- about / Understanding NoSQL
Cassandra
- about / Apache Solr and Cassandra
Cassandra and Solr integration
- about / Working with Cassandra and Solr
- single node configuration / Single node configuration
- multinode Cassandra, integrating / Integrating with multinode Cassandra
chain or pipeline of Analyzers
- about / The Apache Solr architecture
collection / Understanding Solr administration
collection, SolrCloud / The SolrCloud architecture
commit
- about / When to commit changes?
- autocommit / When to commit changes?
- soft commit / When to commit changes?
configuration files, Apache Hadoop
- about / Configuring Hadoop
configuration files, Solr
- about / Configuration files of Apache Solr
- Solr.xml, working with / Working with solr.xml and Solr core
- Solr core, working with / Working with solr.xml and Solr core
- instance configuration, with solrconfig.xml / Instance configuration with solrconfig.xml
- Solr plugin / Understanding the Solr plugin
- other configuration / Other configuration
consistency
- about / Understanding NoSQL
consoles, SolrMeter
- query console / Using SolrMeter
- update console / Using SolrMeter
- commit console / Using SolrMeter
- optimize console / Using SolrMeter
core components, Hadoop
- about / Core components
- Resource Manager (RM) / Core components
- Application Master (AM) / Core components
- Node Manager (NM) / Core components
- NameNode / Core components
- SecondaryNameNode / Core components
- DataNodes / Core components
cran mirrors
- URL / Integrating Solr and R
curl/wget utilities
- about / Extracting request handler – Solr Cell

D

Data Import Handler (DIH)
- about / The Apache Solr architecture
data import handlers, Solr
- about / Understanding data import handlers
data loading, in Solr
- about / Loading data in Apache Solr
- request handler/Solr Cell, extracting / Extracting request handler – Solr Cell
- data import handlers / Understanding data import handlers
- interacting through SolrJ / Interacting with Solr through SolrJ
- rich documents, working with / Working with rich documents (Apache Tika)
DataNodes
- about / Core components
DDL (Data Definition Language)
- about / Understanding Hadoop's ecosystem
Distributed Deadlock
- about / Understanding the limits
distributed search
- about / Understanding a distributed search
- patterns / Distributed search patterns
- enabling, Apache Solr used / Apache Solr and distributed search
distributed search, with Apache Blur
- about / Distributed search using Apache Blur
- Apache Blur, setting up with Hadoop / Setting up Apache Blur with Hadoop
DNS (Domain Name Controller)
- about / Setting up a Hadoop cluster
document / Solr fields
document cache
- about / The document cache
document routing, SolrCloud
- about / Document Routing and Sharding
DocValue / Solr fields
DSE
- URL / Working with Cassandra and Solr

E

e-commerce websites
- about / E-Commerce websites
- usage / E-Commerce websites
Elastic Load Balancing
- URL / Apache Solr and distributed search
elements, Solr schema
- uniqueKey / Other important elements of the Solr schema
- defaultSearchField / Other important elements of the Solr schema
- similarity / Other important elements of the Solr schema
ensemble, Zookeeper
- about / Why ZooKeeper?
enterprise distributed search, with SolrCloud
- building / Building an enterprise distributed search using SolrCloud
- SolrCloud, setting up for development / Setting up SolrCloud for development
- SolrCloud, setting up for production / Setting up SolrCloud for production
- document, adding to SolrCloud / Adding a document to SolrCloud
- shards, creating / Creating shards, collections, and replicas in SolrCloud
- collections, creating / Creating shards, collections, and replicas in SolrCloud
- replicas, creating / Creating shards, collections, and replicas in SolrCloud
enterprise distributed search implementation scenarios
- master/slave / Distributed search patterns
- multi-nodes / Distributed search patterns
- multi-tenant / Distributed search patterns
ETL (Extract-Transform-Load)
- about / Understanding Hadoop's ecosystem
eventual consistency / Why ZooKeeper?

F

fault tolerance, SolrCloud
- about / Load balancing and fault tolerance in SolrCloud
Fields, Apache Solr / Solr fields
field value cache
- about / The field value cache
filter cache
- about / The filter cache
Filters
- about / The Apache Solr architecture

G

garbage collection / Optimizing Java virtual memory

H

Hadoop cluster
- setting up / Setting up a Hadoop cluster
Hadoop optimization
- parameters / Optimizing Hadoop
Hadoop tarball
- URL / Working with the Solr HDFS connector
HDFS
- about / Apache Hadoop's ecosystem
Hello World, with Apache Solr
- about / Hello World with Apache Solr!
- Solr administration / Understanding Solr administration
- Solr navigation / Solr navigation
HiveQL
- about / Understanding Hadoop's ecosystem
Hortonworks
- reference link, for data search / High-level design

I

Index Handler
- about / The Apache Solr architecture
index optimization
- about / Index optimization
- performing / Index optimization
- indexing buffer size, limiting / Limiting indexing buffer size
- commit / When to commit changes?
- index merge, optimizing / Optimizing index merge
- optimize option, for index merging / Optimize option for index merging
- container, optimizing / Optimizing the container
- concurrent clients, optimizing / Optimizing concurrent clients
- Java virtual memory, optimizing / Optimizing Java virtual memory
index partitioning
- about / The SolrCloud architecture
Index Reader
- about / The Apache Solr architecture
index replicator / The Apache Solr architecture
Index Searcher
- about / The Apache Solr architecture
Index Writer
- about / The Apache Solr architecture
information, Solr
- querying / Querying for information in Solr

J

J2EE containers
- Solr, running on / Running Solr on other J2EE containers
Java 1.6
- URL / Prerequisites
JDK
- URL / Prerequisites for setting up Apache Solr
jetty
- Solr, running on / Running Apache Solr on jetty
JIRA for integrating Katta, in Solr
- reference / Creating Katta indexes
JVM
- URL / Common problems and solutions
JVMs
- reference / Optimizing the container

K

K-means clustering
- URL / Integrating Solr and R
Katta
- about / Big data search using Katta
- used, for big data search / Big data search using Katta
- URL / Big data search using Katta
- working / How Katta works?
- architecture / How Katta works?
Katta cluster
- about / How Katta works?
- setting up / Setting up the Katta cluster
- download link, for distribution / Setting up the Katta cluster
- URL, for sample creator script / Creating Katta indexes
Katta indexes
- creating / Creating Katta indexes
Katta Master / How Katta works?

L

laggard problem
- about / Understanding the limits
lazy field loading
- about / The lazy field loading
legacy distributed search
- reference link / Apache Solr and distributed search
limits
- understanding / Understanding the limits
load balancing, SolrCloud
- about / Load balancing and fault tolerance in SolrCloud
log management, for banking
- about / Log management for banking
- problem / The problem
- resolution / How can it be tackled?
- high-level design / High-level design

M

MapReduce
- about / Apache Hadoop's ecosystem
- using / Apache Hadoop's ecosystem
MapReduce approach / Apache Hadoop's ecosystem
map side indexing
- about / Using Solr 1045 Patch – map-side indexing
Map Task / Apache Hadoop's ecosystem
MongoDB
- about / MongoDB at glance
- URL / MongoDB at glance
- data / MongoDB at glance
- installing / Installing MongoDB
- download link / Installing MongoDB
- URL, for project repository / Creating Solr indexes from MongoDB
- Solr indexes, creating from / Creating Solr indexes from MongoDB

N

NameNode
- about / Core components
near-real-time search / When to commit changes?
Node Manager (NM)
- about / Core components
NOSQL
- relating, to Big Data / What is NoSQL and how is it related to Big Data?
NoSQL
- about / What is NoSQL and how is it related to Big Data?, Understanding NoSQL
NOSQL database
- about / Understanding Hadoop's ecosystem

P

parallel-ssh
- URL / Prerequisites
partition tolerance
- about / Understanding NoSQL
Planet Cassandra
- URL / Apache Solr and Cassandra
Portable Document Format (PDF) / Working with rich documents (Apache Tika)
post.jar
- about / Hello World with Apache Solr!
python
- download link / Getting along with Apache Storm

Q

Query Parser
- about / The Apache Solr architecture
query result cache
- about / The query result cache

R

R
- about / Advanced analytics with Solr
- URL / Advanced analytics with Solr
- open source packages / Advanced analytics with Solr
- Solr, integrating with / Integrating Solr and R
reduce side indexing
- about / Using Solr 1301 Patch – reduce-side indexing
Reduce Tasks / Apache Hadoop's ecosystem
request handler
- about / Other configuration
- URL / Other configuration
request handler/Solr Cell
- extracting / Extracting request handler – Solr Cell
Resource Manager (RM)
- about / Core components
Response Writer
- about / The Apache Solr architecture
Rich Text format (RTF) / Working with rich documents (Apache Tika)
Round Robin algorithm
- reference link / Setting up SolrCloud for development

S

search performance
- limits / Understanding the limits
search runtime optimization
- about / Optimizing search runtime
- optimizing, through search query / Optimizing through search query
- filter queries / Filter queries
- Solr cache, optimizing / Optimizing the Solr cache
- Hadoop, optimizing / Optimizing Hadoop
search schema optimization
- about / Optimizing search schema
- default search field, specifying / Specifying default search field
- search schema fields, configuring / Configuring search schema fields
- stop words / Stop words
- stemming / Stemming
SecondaryNameNode
- about / Core components
Secure shell (ssh)
- about / Prerequisites
sequential updates / Why ZooKeeper?
shard index or slice, SolrCloud / The SolrCloud architecture
sharding algorithm, SolrCloud
- about / Sharding algorithm and fault tolerance
- document routing / Document Routing and Sharding
- shard splitting / Shard splitting
- load balancing / Load balancing and fault tolerance in SolrCloud
- fault tolerance / Load balancing and fault tolerance in SolrCloud
Shard Leader, SolrCloud / The SolrCloud architecture
shard replica, SolrCloud / The SolrCloud architecture
shards
- about / Apache Solr and distributed search
shard splitting, SolrCloud
- about / Shard splitting
Solandra
- URL / Single node configuration
Solr
- scaling, through Storm / Scaling Solr through Storm
- advanced analytics / Advanced analytics with Solr
- about / Advanced analytics with Solr
- integrating, with R / Integrating Solr and R
Solr 5.0
- URL / Running Apache Solr on jetty
Solr 1045 Patch
- about / Using Solr 1045 Patch – map-side indexing
- using / Using Solr 1045 Patch – map-side indexing
Solr 1301 Patch
- about / Using Solr 1301 Patch – reduce-side indexing
- using / Using Solr 1301 Patch – reduce-side indexing
- running / Using Solr 1301 Patch – reduce-side indexing
Solr cache optimization
- about / Optimizing the Solr cache
- common parameters / Optimizing the Solr cache
- filter cache / The filter cache
- query result cache / The query result cache
- document cache / The document cache
- field value cache / The field value cache
- lazy field loading / The lazy field loading
Solr Cell
- about / Extracting request handler – Solr Cell
SolrCloud
- working with / Working with SolrCloud
- Zookeeper, using / Why ZooKeeper?
- architecture / The SolrCloud architecture
- used, for building enterprise distributed search / Building an enterprise distributed search using SolrCloud
- parameters, for development process / Setting up SolrCloud for development
- common problems and resolutions / Common problems and resolutions
solrconfig.xml file
- declarations / Instance configuration with solrconfig.xml
Solr configuration
- about / Configuring Solr
- structure / Understanding the Solr structure
- conf/ folder / Understanding the Solr structure
- data/ folder / Understanding the Solr structure
- lib/ folder / Understanding the Solr structure
- Solr schema, defining / Defining the Solr schema
- configuration files / Configuration files of Apache Solr
Solr Core
- about / Understanding Solr administration
Solr core / The SolrCloud architecture
Solr folder
- contrib/ / Running Apache Solr on jetty
- dist/ / Running Apache Solr on jetty
- docs/ / Running Apache Solr on jetty
- example/ / Running Apache Solr on jetty
- licenses/ / Running Apache Solr on jetty
Solr HDFS connector
- working with / Working with the Solr HDFS connector
Solr instance
- monitoring / Monitoring Solr instance
- monitoring, SolrMeter used / Using SolrMeter
SolrJ
- about / Interacting with Solr through SolrJ
- interacting, through / Interacting with Solr through SolrJ
SolrMeter
- about / Using SolrMeter
- used, for monitoring Solr instance / Using SolrMeter
- URL / Using SolrMeter
- consoles / Using SolrMeter
Solr plugin
- about / Understanding the Solr plugin
- search components / Understanding the Solr plugin
- request handlers / Understanding the Solr plugin
- filters / Understanding the Solr plugin
Solr schema
- defining / Defining the Solr schema
- Solr fields / Solr fields
- dynamic fields / Dynamic fields in Solr
- fields, copying / Copying the fields
- field types, dealing with / Dealing with field types
- metadata configuration / Additional metadata configuration
- elements / Other important elements of the Solr schema
Solr Transactional Log
- about / Adding a document to SolrCloud
STDIN (standard input stream)
- about / Extracting request handler – Solr Cell
stemming
- about / Stemming
- algorithms / Stemming
stop words
- about / Stop words
sunspot
- about / Interacting with Solr through SolrJ

T

technologies, Solr
- JavaScript / Interacting with Solr through SolrJ
- Ruby / Interacting with Solr through SolrJ
- PHP / Interacting with Solr through SolrJ
- Java / Interacting with Solr through SolrJ
- Python / Interacting with Solr through SolrJ
- Perl / Interacting with Solr through SolrJ
- .NET / Interacting with Solr through SolrJ
Tokenizer
- about / The Apache Solr architecture

Y

YARN (Yet Another Resource Negotiator) / Apache Hadoop's ecosystem

Z

Znode
- about / Why ZooKeeper?
Zookeeper
- about / Why ZooKeeper?
- features / Why ZooKeeper?
- download link / Getting along with Apache Storm

The rest of the chapter is locked

You're reading from Scaling Big Data with Hadoop and Solr, Second Edition

Table of Contents (13) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

P

Q

R

S

T

Y

Z

Authors (1)

Personalised recommendations for you

You're reading from Scaling Big Data with Hadoop and Solr, Second Edition

Table of Contents (13) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

P

Q

R

S

T

Y

Z

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you