Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Scaling Big Data with Hadoop and Solr, Second Edition

You're reading from  Scaling Big Data with Hadoop and Solr, Second Edition

Product type Book
Published in Apr 2015
Publisher
ISBN-13 9781783553396
Pages 166 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Hrishikesh Vijay Karambelkar Hrishikesh Vijay Karambelkar
Profile icon Hrishikesh Vijay Karambelkar

Index

A

  • advanced analytics, with Solr
    • about / Advanced analytics with Solr
  • Analyzer
    • about / The Apache Solr architecture
  • ant build scripting
    • URL / Single node configuration
  • Apache Ambari
    • about / Understanding Hadoop's ecosystem
  • Apache Blur
    • about / Distributed search using Apache Blur, High-level design
    • working, with Hadoop / Distributed search using Apache Blur
    • setting up, with Hadoop / Setting up Apache Blur with Hadoop
    • URL / Setting up Apache Blur with Hadoop
  • Apache Cassandra
    • URL / Integrating with multinode Cassandra
  • Apache Chukwa
    • about / Understanding Hadoop's ecosystem
  • Apache Flume
    • about / Understanding Hadoop's ecosystem
  • Apache Hadoop
    • about / Apache Hadoop's ecosystem, Understanding Hadoop's ecosystem
    • ecosystem / Apache Hadoop's ecosystem, Understanding Hadoop's ecosystem
    • HDFS / Apache Hadoop's ecosystem
    • MapReduce / Apache Hadoop's ecosystem
    • core components / Core components
    • configuring / Configuring Apache Hadoop, Configuring Hadoop
    • single node setup / Configuring Apache Hadoop
    • pseudo distributed setup / Configuring Apache Hadoop
    • fully distributed setup / Configuring Apache Hadoop
    • pre-requisites / Prerequisites
    • download link / Prerequisites
    • ssh, setting up without passphrase / Setting up ssh without passphrase
    • running / Running Hadoop
    • common problems and solutions / Common problems and their solutions
    • URL / Working with the Solr HDFS connector
  • Apache HBase
    • about / Understanding Hadoop's ecosystem
  • Apache HCatalog
    • about / Understanding Hadoop's ecosystem
  • Apache Hive
    • about / Understanding Hadoop's ecosystem
  • Apache Ivy
    • URL / Single node configuration
  • Apache JIRA site
    • URL / Using Solr 1045 Patch – map-side indexing
  • Apache Kafka
    • about / High-level design
    • URL / High-level design
  • Apache Lucene
    • about / Understanding the limits
  • Apache Lucene core
    • about / The Apache Solr architecture
  • Apache Mahout
    • about / Understanding Hadoop's ecosystem
  • Apache Oozie
    • about / Understanding Hadoop's ecosystem
  • Apache Pig
    • about / Understanding Hadoop's ecosystem
  • Apache Solr
    • setting up / Setting up Apache Solr
    • prerequisites / Prerequisites for setting up Apache Solr
    • running, on jetty / Running Apache Solr on jetty
    • running, on J2EE containers / Running Solr on other J2EE containers
    • Hello World / Hello World with Apache Solr!
    • common problems and solutions / Common problems and solutions
    • architecture / The Apache Solr architecture
    • configuring / Configuring Solr
    • data, loading / Loading data in Apache Solr
    • information, querying for / Querying for information in Solr
    • distributed search, enabling with / Apache Solr and distributed search
    • index partitioning / The SolrCloud architecture
    • download link / Setting up SolrCloud for development
    • working with, Cassandra / Apache Solr and Cassandra
  • Apache Solr and MongoDB integration
    • about / Apache Solr and Big Data – integration with MongoDB
    • NOSQL / What is NoSQL and how is it related to Big Data?
    • MongoDB / MongoDB at glance
    • MongoDB, installing / Installing MongoDB
    • Solr indexes, creating from MongoDB / Creating Solr indexes from MongoDB
  • Apache Sqoop
    • about / Understanding Hadoop's ecosystem
  • Apache Storm
    • about / Scaling Solr through Storm
    • Solr, scaling through / Scaling Solr through Storm
    • URL / Scaling Solr through Storm
    • master node / Scaling Solr through Storm
    • worker node / Scaling Solr through Storm
    • slave node / Scaling Solr through Storm
    • installing / Getting along with Apache Storm
    • download link / Getting along with Apache Storm
  • Apache Tika
    • about / The Apache Solr architecture, Working with rich documents (Apache Tika)
  • Apache Zookeeper
    • about / Understanding Hadoop's ecosystem
    • URL / Setting up SolrCloud for production
  • Application Master (AM)
    • about / Core components
  • architecture, Solr
    • about / The Apache Solr architecture
    • index replicator / The Apache Solr architecture
  • architecture, SolrCloud
    • about / The SolrCloud architecture
  • availability
    • about / Understanding NoSQL

B

  • big data search
    • Katta used / Big data search using Katta

C

  • Cache Autowarming / Optimizing the Solr cache
  • CAP theorem
    • URL / Apache Solr and Big Data – integration with MongoDB, Understanding NoSQL
    • about / Understanding NoSQL
  • Cassandra
    • about / Apache Solr and Cassandra
  • Cassandra and Solr integration
    • about / Working with Cassandra and Solr
    • single node configuration / Single node configuration
    • multinode Cassandra, integrating / Integrating with multinode Cassandra
  • chain or pipeline of Analyzers
    • about / The Apache Solr architecture
  • collection / Understanding Solr administration
  • collection, SolrCloud / The SolrCloud architecture
  • commit
    • about / When to commit changes?
    • autocommit / When to commit changes?
    • soft commit / When to commit changes?
  • configuration files, Apache Hadoop
    • about / Configuring Hadoop
  • configuration files, Solr
    • about / Configuration files of Apache Solr
    • Solr.xml, working with / Working with solr.xml and Solr core
    • Solr core, working with / Working with solr.xml and Solr core
    • instance configuration, with solrconfig.xml / Instance configuration with solrconfig.xml
    • Solr plugin / Understanding the Solr plugin
    • other configuration / Other configuration
  • consistency
    • about / Understanding NoSQL
  • consoles, SolrMeter
    • query console / Using SolrMeter
    • update console / Using SolrMeter
    • commit console / Using SolrMeter
    • optimize console / Using SolrMeter
  • core components, Hadoop
    • about / Core components
    • Resource Manager (RM) / Core components
    • Application Master (AM) / Core components
    • Node Manager (NM) / Core components
    • NameNode / Core components
    • SecondaryNameNode / Core components
    • DataNodes / Core components
  • cran mirrors
    • URL / Integrating Solr and R
  • curl/wget utilities
    • about / Extracting request handler – Solr Cell

D

  • Data Import Handler (DIH)
    • about / The Apache Solr architecture
  • data import handlers, Solr
    • about / Understanding data import handlers
  • data loading, in Solr
    • about / Loading data in Apache Solr
    • request handler/Solr Cell, extracting / Extracting request handler – Solr Cell
    • data import handlers / Understanding data import handlers
    • interacting through SolrJ / Interacting with Solr through SolrJ
    • rich documents, working with / Working with rich documents (Apache Tika)
  • DataNodes
    • about / Core components
  • DDL (Data Definition Language)
    • about / Understanding Hadoop's ecosystem
  • Distributed Deadlock
    • about / Understanding the limits
  • distributed search
    • about / Understanding a distributed search
    • patterns / Distributed search patterns
    • enabling, Apache Solr used / Apache Solr and distributed search
  • distributed search, with Apache Blur
    • about / Distributed search using Apache Blur
    • Apache Blur, setting up with Hadoop / Setting up Apache Blur with Hadoop
  • DNS (Domain Name Controller)
    • about / Setting up a Hadoop cluster
  • document / Solr fields
  • document cache
    • about / The document cache
  • document routing, SolrCloud
    • about / Document Routing and Sharding
  • DocValue / Solr fields
  • DSE
    • URL / Working with Cassandra and Solr

E

  • e-commerce websites
    • about / E-Commerce websites
    • usage / E-Commerce websites
  • Elastic Load Balancing
    • URL / Apache Solr and distributed search
  • elements, Solr schema
    • uniqueKey / Other important elements of the Solr schema
    • defaultSearchField / Other important elements of the Solr schema
    • similarity / Other important elements of the Solr schema
  • ensemble, Zookeeper
    • about / Why ZooKeeper?
  • enterprise distributed search, with SolrCloud
    • building / Building an enterprise distributed search using SolrCloud
    • SolrCloud, setting up for development / Setting up SolrCloud for development
    • SolrCloud, setting up for production / Setting up SolrCloud for production
    • document, adding to SolrCloud / Adding a document to SolrCloud
    • shards, creating / Creating shards, collections, and replicas in SolrCloud
    • collections, creating / Creating shards, collections, and replicas in SolrCloud
    • replicas, creating / Creating shards, collections, and replicas in SolrCloud
  • enterprise distributed search implementation scenarios
    • master/slave / Distributed search patterns
    • multi-nodes / Distributed search patterns
    • multi-tenant / Distributed search patterns
  • ETL (Extract-Transform-Load)
    • about / Understanding Hadoop's ecosystem
  • eventual consistency / Why ZooKeeper?

F

  • fault tolerance, SolrCloud
    • about / Load balancing and fault tolerance in SolrCloud
  • Fields, Apache Solr / Solr fields
  • field value cache
    • about / The field value cache
  • filter cache
    • about / The filter cache
  • Filters
    • about / The Apache Solr architecture

G

  • garbage collection / Optimizing Java virtual memory

H

  • Hadoop cluster
    • setting up / Setting up a Hadoop cluster
  • Hadoop optimization
    • parameters / Optimizing Hadoop
  • Hadoop tarball
    • URL / Working with the Solr HDFS connector
  • HDFS
    • about / Apache Hadoop's ecosystem
  • Hello World, with Apache Solr
    • about / Hello World with Apache Solr!
    • Solr administration / Understanding Solr administration
    • Solr navigation / Solr navigation
  • HiveQL
    • about / Understanding Hadoop's ecosystem
  • Hortonworks
    • reference link, for data search / High-level design

I

  • Index Handler
    • about / The Apache Solr architecture
  • index optimization
    • about / Index optimization
    • performing / Index optimization
    • indexing buffer size, limiting / Limiting indexing buffer size
    • commit / When to commit changes?
    • index merge, optimizing / Optimizing index merge
    • optimize option, for index merging / Optimize option for index merging
    • container, optimizing / Optimizing the container
    • concurrent clients, optimizing / Optimizing concurrent clients
    • Java virtual memory, optimizing / Optimizing Java virtual memory
  • index partitioning
    • about / The SolrCloud architecture
  • Index Reader
    • about / The Apache Solr architecture
  • index replicator / The Apache Solr architecture
  • Index Searcher
    • about / The Apache Solr architecture
  • Index Writer
    • about / The Apache Solr architecture
  • information, Solr
    • querying / Querying for information in Solr

J

  • J2EE containers
    • Solr, running on / Running Solr on other J2EE containers
  • Java 1.6
    • URL / Prerequisites
  • JDK
    • URL / Prerequisites for setting up Apache Solr
  • jetty
    • Solr, running on / Running Apache Solr on jetty
  • JIRA for integrating Katta, in Solr
    • reference / Creating Katta indexes
  • JVM
    • URL / Common problems and solutions
  • JVMs
    • reference / Optimizing the container

K

  • K-means clustering
    • URL / Integrating Solr and R
  • Katta
    • about / Big data search using Katta
    • used, for big data search / Big data search using Katta
    • URL / Big data search using Katta
    • working / How Katta works?
    • architecture / How Katta works?
  • Katta cluster
    • about / How Katta works?
    • setting up / Setting up the Katta cluster
    • download link, for distribution / Setting up the Katta cluster
    • URL, for sample creator script / Creating Katta indexes
  • Katta indexes
    • creating / Creating Katta indexes
  • Katta Master / How Katta works?

L

  • laggard problem
    • about / Understanding the limits
  • lazy field loading
    • about / The lazy field loading
  • legacy distributed search
    • reference link / Apache Solr and distributed search
  • limits
    • understanding / Understanding the limits
  • load balancing, SolrCloud
    • about / Load balancing and fault tolerance in SolrCloud
  • log management, for banking
    • about / Log management for banking
    • problem / The problem
    • resolution / How can it be tackled?
    • high-level design / High-level design

M

  • MapReduce
    • about / Apache Hadoop's ecosystem
    • using / Apache Hadoop's ecosystem
  • MapReduce approach / Apache Hadoop's ecosystem
  • map side indexing
    • about / Using Solr 1045 Patch – map-side indexing
  • Map Task / Apache Hadoop's ecosystem
  • MongoDB
    • about / MongoDB at glance
    • URL / MongoDB at glance
    • data / MongoDB at glance
    • installing / Installing MongoDB
    • download link / Installing MongoDB
    • URL, for project repository / Creating Solr indexes from MongoDB
    • Solr indexes, creating from / Creating Solr indexes from MongoDB

N

  • NameNode
    • about / Core components
  • near-real-time search / When to commit changes?
  • Node Manager (NM)
    • about / Core components
  • NOSQL
    • relating, to Big Data / What is NoSQL and how is it related to Big Data?
  • NoSQL
    • about / What is NoSQL and how is it related to Big Data?, Understanding NoSQL
  • NOSQL database
    • about / Understanding Hadoop's ecosystem

P

  • parallel-ssh
    • URL / Prerequisites
  • partition tolerance
    • about / Understanding NoSQL
  • Planet Cassandra
    • URL / Apache Solr and Cassandra
  • Portable Document Format (PDF) / Working with rich documents (Apache Tika)
  • post.jar
    • about / Hello World with Apache Solr!
  • python
    • download link / Getting along with Apache Storm

Q

  • Query Parser
    • about / The Apache Solr architecture
  • query result cache
    • about / The query result cache

R

  • R
    • about / Advanced analytics with Solr
    • URL / Advanced analytics with Solr
    • open source packages / Advanced analytics with Solr
    • Solr, integrating with / Integrating Solr and R
  • reduce side indexing
    • about / Using Solr 1301 Patch – reduce-side indexing
  • Reduce Tasks / Apache Hadoop's ecosystem
  • request handler
    • about / Other configuration
    • URL / Other configuration
  • request handler/Solr Cell
    • extracting / Extracting request handler – Solr Cell
  • Resource Manager (RM)
    • about / Core components
  • Response Writer
    • about / The Apache Solr architecture
  • Rich Text format (RTF) / Working with rich documents (Apache Tika)
  • Round Robin algorithm
    • reference link / Setting up SolrCloud for development

S

  • search performance
    • limits / Understanding the limits
  • search runtime optimization
    • about / Optimizing search runtime
    • optimizing, through search query / Optimizing through search query
    • filter queries / Filter queries
    • Solr cache, optimizing / Optimizing the Solr cache
    • Hadoop, optimizing / Optimizing Hadoop
  • search schema optimization
    • about / Optimizing search schema
    • default search field, specifying / Specifying default search field
    • search schema fields, configuring / Configuring search schema fields
    • stop words / Stop words
    • stemming / Stemming
  • SecondaryNameNode
    • about / Core components
  • Secure shell (ssh)
    • about / Prerequisites
  • sequential updates / Why ZooKeeper?
  • shard index or slice, SolrCloud / The SolrCloud architecture
  • sharding algorithm, SolrCloud
    • about / Sharding algorithm and fault tolerance
    • document routing / Document Routing and Sharding
    • shard splitting / Shard splitting
    • load balancing / Load balancing and fault tolerance in SolrCloud
    • fault tolerance / Load balancing and fault tolerance in SolrCloud
  • Shard Leader, SolrCloud / The SolrCloud architecture
  • shard replica, SolrCloud / The SolrCloud architecture
  • shards
    • about / Apache Solr and distributed search
  • shard splitting, SolrCloud
    • about / Shard splitting
  • Solandra
    • URL / Single node configuration
  • Solr
    • scaling, through Storm / Scaling Solr through Storm
    • advanced analytics / Advanced analytics with Solr
    • about / Advanced analytics with Solr
    • integrating, with R / Integrating Solr and R
  • Solr 5.0
    • URL / Running Apache Solr on jetty
  • Solr 1045 Patch
    • about / Using Solr 1045 Patch – map-side indexing
    • using / Using Solr 1045 Patch – map-side indexing
  • Solr 1301 Patch
    • about / Using Solr 1301 Patch – reduce-side indexing
    • using / Using Solr 1301 Patch – reduce-side indexing
    • running / Using Solr 1301 Patch – reduce-side indexing
  • Solr cache optimization
    • about / Optimizing the Solr cache
    • common parameters / Optimizing the Solr cache
    • filter cache / The filter cache
    • query result cache / The query result cache
    • document cache / The document cache
    • field value cache / The field value cache
    • lazy field loading / The lazy field loading
  • Solr Cell
    • about / Extracting request handler – Solr Cell
  • SolrCloud
    • working with / Working with SolrCloud
    • Zookeeper, using / Why ZooKeeper?
    • architecture / The SolrCloud architecture
    • used, for building enterprise distributed search / Building an enterprise distributed search using SolrCloud
    • parameters, for development process / Setting up SolrCloud for development
    • common problems and resolutions / Common problems and resolutions
  • solrconfig.xml file
    • declarations / Instance configuration with solrconfig.xml
  • Solr configuration
    • about / Configuring Solr
    • structure / Understanding the Solr structure
    • conf/ folder / Understanding the Solr structure
    • data/ folder / Understanding the Solr structure
    • lib/ folder / Understanding the Solr structure
    • Solr schema, defining / Defining the Solr schema
    • configuration files / Configuration files of Apache Solr
  • Solr Core
    • about / Understanding Solr administration
  • Solr core / The SolrCloud architecture
  • Solr folder
    • contrib/ / Running Apache Solr on jetty
    • dist/ / Running Apache Solr on jetty
    • docs/ / Running Apache Solr on jetty
    • example/ / Running Apache Solr on jetty
    • licenses/ / Running Apache Solr on jetty
  • Solr HDFS connector
    • working with / Working with the Solr HDFS connector
  • Solr instance
    • monitoring / Monitoring Solr instance
    • monitoring, SolrMeter used / Using SolrMeter
  • SolrJ
    • about / Interacting with Solr through SolrJ
    • interacting, through / Interacting with Solr through SolrJ
  • SolrMeter
    • about / Using SolrMeter
    • used, for monitoring Solr instance / Using SolrMeter
    • URL / Using SolrMeter
    • consoles / Using SolrMeter
  • Solr plugin
    • about / Understanding the Solr plugin
    • search components / Understanding the Solr plugin
    • request handlers / Understanding the Solr plugin
    • filters / Understanding the Solr plugin
  • Solr schema
    • defining / Defining the Solr schema
    • Solr fields / Solr fields
    • dynamic fields / Dynamic fields in Solr
    • fields, copying / Copying the fields
    • field types, dealing with / Dealing with field types
    • metadata configuration / Additional metadata configuration
    • elements / Other important elements of the Solr schema
  • Solr Transactional Log
    • about / Adding a document to SolrCloud
  • STDIN (standard input stream)
    • about / Extracting request handler – Solr Cell
  • stemming
    • about / Stemming
    • algorithms / Stemming
  • stop words
    • about / Stop words
  • sunspot
    • about / Interacting with Solr through SolrJ

T

  • technologies, Solr
    • JavaScript / Interacting with Solr through SolrJ
    • Ruby / Interacting with Solr through SolrJ
    • PHP / Interacting with Solr through SolrJ
    • Java / Interacting with Solr through SolrJ
    • Python / Interacting with Solr through SolrJ
    • Perl / Interacting with Solr through SolrJ
    • .NET / Interacting with Solr through SolrJ
  • Tokenizer
    • about / The Apache Solr architecture

Y

  • YARN (Yet Another Resource Negotiator) / Apache Hadoop's ecosystem

Z

  • Znode
    • about / Why ZooKeeper?
  • Zookeeper
    • about / Why ZooKeeper?
    • features / Why ZooKeeper?
    • download link / Getting along with Apache Storm
lock icon The rest of the chapter is locked
arrow left Previous Chapter
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}