Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Real-time Analytics with Storm and Cassandra

You're reading from  Real-time Analytics with Storm and Cassandra

Product type Book
Published in Mar 2015
Publisher
ISBN-13 9781784395490
Pages 220 pages
Edition 1st Edition
Languages
Author (1):
Shilpi Saxena Shilpi Saxena
Profile icon Shilpi Saxena

Table of Contents (19) Chapters

Real-time Analytics with Storm and Cassandra
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Let's Understand Storm 2. Getting Started with Your First Topology 3. Understanding Storm Internals by Examples 4. Storm in a Clustered Mode 5. Storm High Availability and Failover 6. Adding NoSQL Persistence to Storm 7. Cassandra Partitioning, High Availability, and Consistency 8. Cassandra Management and Maintenance 9. Storm Management and Maintenance 10. Advance Concepts in Storm 11. Distributed Cache and CEP with Storm Quiz Answers Index

Index

A

  • ack() method
    • about / Spouts
  • acking
    • about / Anchoring and acking
  • Advanced Message Queuing Protocol (AMQP)
    • about / An overview of RabbitMQ
  • Aggregator function
    • about / Aggregator
  • Aircraft Communications Addressing and Reporting (ACAR) system
    • about / Aircraft Communications Addressing and Reporting system
  • Alabama Occupational Therapy Association (ALOTA)
    • about / Licensed proprietary solutions
  • alert bolt
    • about / Bolts
  • All grouping
    • about / All grouping
  • AMQP spout
    • topology, writing / Wiring the topology for the AMQP spout
  • anchoring
    • about / Anchoring and acking
    • unreliable topology / The unreliable topology
  • Ankush
    • about / Storm monitoring tools
    • URL / Storm monitoring tools
  • Astyanax
    • about / Using different client APIs to access Cassandra

B

  • best practices, Storm Cassandra applications / The best practices for Storm/Cassandra applications
  • bolt
    • about / Bolts
    • parse event bolt / Bolts
    • location bolt / Bolts
    • verify bolt / Bolts
    • alert bolt / Bolts
    • IRichBolt / Bolts
    • IBasicBolt / Bolts
    • prepare() method / Bolts
    • execute() method / Bolts
  • bootstrapping
    • about / Bootstrapping

C

  • cache
    • topology, building / Building a topology with a cache
  • Call Detail Record (CDR) / Storm topology wired to the Cassandra store
  • Cassandra
    • advantages / The advantages of Cassandra
    • about / Columnar database fundamentals
    • installing / Installing Cassandra
    • client APIs, used for accessing / Using different client APIs to access Cassandra
    • gossip protocol / Cassandra – gossip protocol
    • scaling / Cassandra cluster scaling – adding a new node
    • dead node, replacing / Cassandra cluster – replacing a dead node
    • fault tolerance / Cassandra fault tolerance
    • monitoring systems / Cassandra monitoring systems
  • Cassandra cluster
    • setting up / Setting up the Cassandra cluster
  • Cassandra consistency
    • about / Cassandra consistency
    • write consistency / Write consistency
    • read consistency / Read consistency
    • maintenance features / Consistency maintenance features
  • Cassandra data centres
    • installing / Installing Cassandra data centers
  • CEP
    • about / Introduction to the complex event processing engine
    • use cases / Introduction to the complex event processing engine
    • Esper / Esper
  • cleanup command
    • about / The nodetool commands
  • clean up script, Zookeeper
    • about / Cleaning up Zookeeper
    • numBackUps / Cleaning up Zookeeper
    • dataDir / Cleaning up Zookeeper
    • logDir / Cleaning up Zookeeper
    • org.apache.zookeeper.server.PurgeTxnLog / Cleaning up Zookeeper
  • CLI
    • about / Introduction to CLI
    • used, for rebalancing topology / Rebalancing using the CLI
  • client APIs
    • used, for accessing Cassandra / Using different client APIs to access Cassandra
    • Thrift protocol / Using different client APIs to access Cassandra
    • Hector / Using different client APIs to access Cassandra
    • Datastax Java driver / Using different client APIs to access Cassandra
    • Astyanax / Using different client APIs to access Cassandra
  • columnar database, fundamentals
    • about / Columnar database fundamentals
    • column families, types / Types of column families
    • columns, types / Types of columns
  • column families, types
    • static column family / Types of column families
    • dynamic column family / Types of column families
  • columns, types
    • composite column / Types of columns
    • expiring columns / Types of columns
    • counter columns / Types of columns
  • CombinerAggregator function
    • about / CombinerAggregator
  • Command Prompt
    • Storm topology, executing / Executing the topology from Command Prompt
  • compaction command
    • about / The nodetool commands
  • component level section, Storm UI
    • capacity / Section 4
    • Execute Latency / Section 4
    • Process Latency / Section 4
  • components, Storm topology
    • spout / Spouts
    • bolt / Bolts
    • stream / Streams
    • tuple / Tuples – the data model in Storm
  • composite column / Types of columns
  • consistency maintenance features
    • about / Consistency maintenance features
    • read repair / Consistency maintenance features
    • Anti-Entropy repair service / Consistency maintenance features
    • hinted handoff / Consistency maintenance features
  • consistent hashing
    • about / Consistent hashing
    • practical example / Consistent hashing
    • node(s) goes down scenario / One or more node goes down
    • node(s) comes back up scenario / One or more node comes back up
  • counter columns / Types of columns
  • CQLSH
    • about / Introduction to CQLSH
  • custom grouping
    • about / Custom grouping
  • custom solution
    • for complex distributed use cases / A custom solution
  • Cygwin
    • URL / Prerequisites for setting up Storm

D

  • data aggregations, operations
    • aggregate / Data aggregations over the streams
    • persistentAggregate / Data aggregations over the streams
  • data centres, Cassandra
    • installing / Installing Cassandra data centers
  • Datastax Java driver
    • about / Using different client APIs to access Cassandra
  • Datastax OpsCenter
    • about / Datastax OpsCenter
    • features / Datastax OpsCenter
    • installing / Datastax OpsCenter
  • dead node, Cassandra
    • replacing / Cassandra cluster – replacing a dead node
  • decommission command
    • about / The nodetool commands
  • Directed Acyclic Graph (DAG)
    • about / Spouts, The Storm UI
  • direct grouping
    • about / Direct grouping
  • distributed caching
    • need for / The need for distributed caching in Storm
  • distributed computing problems
    • about / Distributed computing problems
    • credit or debit card fraud detection / Real-time business solution for credit or debit card fraud detection
    • Aircraft Communications Addressing and Reporting (ACAR) system / Aircraft Communications Addressing and Reporting system
    • healthcare / Healthcare
    • manufacturing / Other applications
    • transportation industry / Other applications
    • network optimization / Other applications
  • dynamic column family / Types of column families

E

  • Eclipse
    • URL / Prerequisites for setting up Storm
  • ensemble
    • about / Set up Zookeeper (V 3.3.5) for Storm
  • ESP
    • about / Introduction to the complex event processing engine
  • Esper
    • about / Esper
    • URL / Esper
    • URL, for downloading / Getting started with Esper
    • example / Getting started with Esper
    • integrating, with Storm / Integrating Esper with Storm
  • Esper, features
    • throughput / Esper
    • latency / Esper
    • computations / Esper
  • event
    • about / Components of a Storm topology
  • event processing language
    • about / Introduction to the complex event processing engine
  • execute() method
    • about / Bolts
  • expiring columns / Types of columns

F

  • fail() method
    • about / Spouts
  • failure scenario handling
    • failure detection / Failure scenario handling – detection and recovery
    • failure recovery / Failure scenario handling – detection and recovery
  • fault tolerance, Cassandra
    • about / Cassandra fault tolerance
  • fields grouping
    • about / Fields grouping
  • FileSpout
    • creating / Creating FileSpout
    • WordCountTopology, tweaking / Tweaking WordCount topology to use FileSpout
    • SocketSpout class, creating / The SocketSpout class
  • filters
    • about / Filters
  • functions
    • about / Functions
  • functions, Storm configurations
    • storm.zookeeper.servers / Storm configurations
    • storm.zookeeper.port / Storm configurations
    • storm.local.dir / Storm configurations
    • nimbus.host / Storm configurations
    • topology.message.timeout.secs / Storm configurations
    • topology.debug / Storm configurations
    • supervisor.slots.ports / Storm configurations

G

  • Ganglia / Storm monitoring tools
  • GigaSpaces
    • about / Licensed proprietary solutions
  • Git
    • URL / Prerequisites for setting up Storm
  • global grouping
    • about / Global grouping
  • gossip protocol
    • about / Cassandra – gossip protocol
    • bootstrapping / Bootstrapping
    • failure scenario handling / Failure scenario handling – detection and recovery
  • GUI
    • used, for rebalancing topology / Rebalancing using the GUI

H

  • Hadoop solution
    • for complex distributed use cases / The Hadoop solution
  • high availability
    • building / Building high availability of components
    • building, of Storm / High availability of the Storm cluster
    • testing / High availability of the Storm cluster
    • Storm cluster, processing / Guaranteed processing of the Storm cluster

I

  • IBasicBolt
    • about / Bolts
  • IBM
    • about / Licensed proprietary solutions
  • info command
    • about / The nodetool commands
  • installation, Datastax OpsCenter
    • about / Datastax OpsCenter
  • installation, RabbitMQ
    • about / Installing the RabbitMQ cluster
  • IRichBolt
    • about / Bolts
  • IRichSpout interface
    • about / Spouts

J

  • Java
    • URL / Prerequisites for setting up Storm
  • JMX monitoring
    • about / JMX monitoring
  • join
    • about / Merge and join
  • join command
    • about / The nodetool commands

K

  • Key Performance Indicators (KPIs)
    • about / Datastax OpsCenter

L

  • Least Recently Used (LRU)
    • about / Introduction to memcached
  • licensed proprietary solutions
    • for complex distributed use cases / Licensed proprietary solutions
  • listeners
    • about / Introduction to the complex event processing engine
  • local or shuffle grouping
    • about / Local or shuffle grouping
  • local partition manipulation operation
    • about / Local partition manipulation operation
    • functions / Functions
    • filters / Filters
    • partitionAggregate function / partitionAggregate
  • location bolt
    • about / Bolts

M

  • MapReduce
    • about / The Hadoop solution
  • memcached
    • about / Introduction to memcached
    • components / Introduction to memcached
    • memcache, setting up / Setting up memcache
    • topology, building with cache / Building a topology with a cache
  • memcached, scenarios
    • cache hit / Introduction to memcached
    • cache miss / Introduction to memcached
  • merge
    • about / Merge and join
  • messages processed section, Storm UI
    • window / Section 3
    • emitted / Section 3
    • transferred / Section 3
    • complete latency(ms) / Section 3
    • acked / Section 3
    • failed / Section 3
  • mirror queues
    • creating / Creating mirror queues for high availability
  • Mongo DB
    • about / Columnar database fundamentals
  • monitoring systems, Cassandra
    • JMX monitoring / JMX monitoring
    • Datastax OpsCenter / Datastax OpsCenter
  • multiple data centers
    • about / Multiple data centers
    • prerequisites, for setting up / Prerequisites for setting up multiple data centers

N

  • Nagios
    • about / Storm monitoring tools
  • Neo4J
    • about / Columnar database fundamentals
  • nextTuple() method
    • about / Spouts
  • Nimbus
    • about / A high-level view of various components of Storm, Launching Storm daemons
  • node(s) comes back up scenario, consistent hashing / One or more node comes back up
  • node(s) goes down scenario, consistent hashing / One or more node goes down
  • nodes, Storm
    • about / A high-level view of various components of Storm
    • Nimbus / A high-level view of various components of Storm, Launching Storm daemons
    • Zookeeper / A high-level view of various components of Storm
    • Supervisor / A high-level view of various components of Storm, Launching Storm daemons
    • UI / Launching Storm daemons
  • nodetool commands
    • about / The nodetool commands
    • ring / The nodetool commands
    • join / The nodetool commands
    • info / The nodetool commands
    • cleanup / The nodetool commands
    • compaction / The nodetool commands
    • decommission / The nodetool commands
    • removenode / The nodetool commands
    • repair / The nodetool commands
  • nodetool repair command
    • about / Failure scenario handling – detection and recovery

O

  • online transaction processing (OLTP)
    • about / Setting up the Cassandra cluster
  • open source technology
    • for complex distributed use cases / Other real-time processing tools
  • options, read consistency
    • ONE / Read consistency
    • TWO / Read consistency
    • THREE / Read consistency
    • QUORUM / Read consistency
  • options, write consistency
    • ANY / Write consistency
    • ONE / Write consistency
    • TWO / Write consistency
    • QUORUM / Write consistency
    • ALL / Write consistency
  • Oracle
    • about / Licensed proprietary solutions

P

  • parallel processing
    • about / Setting up workers and parallelism to enhance processing
    • scenario 1 / Scenario 1
    • scenario 2 / Scenario 2
    • scenario 3 / Scenario 3
  • parse event bolt
    • about / Bolts
  • partitionAggregate function
    • about / partitionAggregate
    • sum aggregator function / Sum aggregate
    • CombinerAggregator function / CombinerAggregator
    • ReducerAggregator function / ReducerAggregator
    • Aggregator function / Aggregator
  • Plain Old Java Object (POJO's) / Storm topology wired to the Cassandra store
  • Point Of Sales (POS)
    • about / Introduction to the complex event processing engine
  • prepare() method
    • about / Bolts

Q

  • queue-worker solution
    • limitations / A custom solution

R

  • RabbitMQ
    • overview / An overview of RabbitMQ
    • exchange / An overview of RabbitMQ
    • Queue / An overview of RabbitMQ
    • installing / Installing the RabbitMQ cluster
    • prerequisites, for installation / Prerequisites for the setup of RabbitMQ
    • integrating, with Storm / Integrating Storm with RabbitMQ
    • feeder component, creating / Creating a RabbitMQ feeder component
    • topology, writing for AMQP spout / Wiring the topology for the AMQP spout
    • high availability, building / Building high availability of components
  • RabbitMQ server
    • setting up, on Ubuntu / Setting up a RabbitMQ server
    • testing / Testing the RabbitMQ server
    • RabbitMQ cluster, creating / Creating a RabbitMQ cluster
    • RabbitMQ UI, enabling / Enabling the RabbitMQ UI
    • mirror queues, creating / Creating mirror queues for high availability
  • read consistency / Read consistency
  • Real Time Decisions (RTD)
    • about / Licensed proprietary solutions
  • ReducerAggregator function
    • about / ReducerAggregator
  • Remote Procedure Call (RPC)
    • about / Using different client APIs to access Cassandra
  • Remote Procedure Calls (RPC)
    • about / Building a Trident topology
  • removenode command
    • about / The nodetool commands
  • repair command
    • about / The nodetool commands
  • repartitioning function
    • Shuffle / Operations related to stream repartitioning
    • Broadcast / Operations related to stream repartitioning
    • partitionBy / Operations related to stream repartitioning
    • global / Operations related to stream repartitioning
    • batchGlobal / Operations related to stream repartitioning
  • replication, Cassandra
    • about / Replication in Cassandra and strategies
  • replication factor
    • about / The replication factor
  • ring command
    • about / The nodetool commands

S

  • satellite communication (SATCOM)
    • about / Aircraft Communications Addressing and Reporting system
  • scaling, Cassandra
    • new node, adding / Cassandra cluster scaling – adding a new node
  • Service Level Agreement (SLA) / Storm monitoring tools
  • snitch
    • about / Replication in Cassandra and strategies
  • solutions, for complex distributed use cases
    • Hadoop solution / The Hadoop solution
    • custom solution / A custom solution
    • licensed proprietary solutions / Licensed proprietary solutions
    • open source technology / Other real-time processing tools
  • Spark
    • about / Other real-time processing tools
  • spout
    • about / Spouts
    • IRichSpout interface / Spouts
    • nextTuple() method / Spouts
    • ack() method / Spouts
    • fail() method / Spouts
  • standard column / Types of columns
  • static column family / Types of column families
  • Storm
    • nodes / A high-level view of various components of Storm
    • topology execution / Delving into the internals of Storm
    • prerequisites / Prerequisites for setting up Storm
    • RabbitMQ, integrating / Integrating Storm with RabbitMQ
    • high availability, building / High availability of the Storm cluster
    • Esper, integrating / Integrating Esper with Storm
  • Storm Cassandra applications
    • best practices / The best practices for Storm/Cassandra applications
  • Storm cluster
    • setting up / The Storm cluster setup
  • Storm configurations
    • about / Storm configurations
  • Storm isolation scheduler
    • about / The Storm isolation scheduler
  • Storm logging configurations
    • about / Storm logging configurations
    • <file> / Storm logging configurations
    • <filenamepattern> / Storm logging configurations
    • <minIndex> / Storm logging configurations
    • <maxIndex> / Storm logging configurations
    • maxFileSize / Storm logging configurations
    • root level / Storm logging configurations
  • Storm logs
    • troubleshooting / Storm logs
  • Storm monitoring tools
    • about / Storm monitoring tools
    • Nagios / Storm monitoring tools
    • Ganglia / Storm monitoring tools
    • SupervisorD / Storm monitoring tools
    • Ankush / Storm monitoring tools
  • Storm spouts
    • customizing / Customizing Storm spouts
    • FileSpout, creating / Creating FileSpout
  • Storm starter project
    • WordCountTopology, executing / WordCount topology from the Storm-starter project
  • Storm topology
    • components / Components of a Storm topology
    • executing, in distributed mode / Executing the topology in the distributed mode
    • Zookeeper (v 3.3.5), setting up / Set up Zookeeper (V 3.3.5) for Storm
    • Storm, setting up in distributed mode / Setting up Storm in the distributed mode
    • Storm nodes, launching / Launching Storm daemons
    • executing, from Command Prompt / Executing the topology from Command Prompt
    • wordCount Topology, tweaking / Tweaking the WordCount topology to customize it
  • Storm topology, Cassandra store
    • use case / Storm topology wired to the Cassandra store
  • Storm UI
    • about / The Storm UI
  • strategies, Cassandra
    • about / Replication in Cassandra and strategies
    • simple / Replication in Cassandra and strategies
    • network / Replication in Cassandra and strategies
  • stream
    • about / Streams
  • stream groupings
    • about / Stream groupings
    • local or shuffle grouping / Local or shuffle grouping
    • fields grouping / Fields grouping
    • All grouping / All grouping
    • global grouping / Global grouping
    • custom grouping / Custom grouping
    • direct grouping / Direct grouping
  • sum aggregator function
    • about / Sum aggregate
  • Supervisor
    • about / A high-level view of various components of Storm, Launching Storm daemons
    • adding / Scaling the Storm cluster – adding new supervisor nodes
  • SupervisorD / Storm monitoring tools

T

  • Thrift protocol / Using different client APIs to access Cassandra
  • Time to live (TTL) / Types of columns
  • topologies section, Storm UI
    • Topology Name / Section 1
    • ID / Section 1
    • Status / Section 1
    • Uptime / Section 1
    • Num workers / Section 1
    • Num Executors / Section 1
    • Num Tasks / Section 1
  • topology
    • rebalancing / Scaling the Storm cluster and rebalancing the topology
    • rebalancing, GUI used / Rebalancing using the GUI
    • rebalancing, CLI used / Rebalancing using the CLI
    • building, with cache / Building a topology with a cache
  • topology actions section, Storm UI
    • activate / Section 2
    • deactivate / Section 2
    • rebalance / Section 2
    • kill / Section 2
  • transactions per second (tps)
    • about / Building a Trident topology
  • Trident API
    • about / Understanding the Trident API
    • local partition manipulation operation / Local partition manipulation operation
    • operations, related to stream repartitioning / Operations related to stream repartitioning
    • data aggregations, over streams / Data aggregations over the streams
    • field, grouping over / Grouping over a field in a stream
    • join / Merge and join
    • merge / Merge and join
  • Trident topology
    • building / Building a Trident topology
    • examples / Examples and illustrations
    • illustrations / Examples and illustrations
  • troubleshooting
    • about / Storm troubleshooting
    • UI / The Storm UI
    • Storm logs / Storm logs
  • tuple
    • about / Components of a Storm topology, Tuples – the data model in Storm

U

  • Ubuntu
    • RabbitMQ server, setting up / Setting up a RabbitMQ server
  • UI
    • about / Launching Storm daemons
    • troubleshooting / The Storm UI

V

  • verify bolt
    • about / Bolts
  • visualization section, Storm UI / The visualization section

W

  • wordCount Topology
    • tweaking / Tweaking the WordCount topology to customize it
  • WordCountTopology
    • executing, from Storm starter project / WordCount topology from the Storm-starter project
  • workers
    • setting up / Setting up workers and parallelism to enhance processing
  • write consistency / Write consistency

X

  • XAP
    • about / Licensed proprietary solutions

Z

  • zoo.cfg configuration file, properties
    • tickTime / Set up Zookeeper (V 3.3.5) for Storm
    • initLimit / Set up Zookeeper (V 3.3.5) for Storm
    • syncLimit / Set up Zookeeper (V 3.3.5) for Storm
    • dataDir / Set up Zookeeper (V 3.3.5) for Storm
    • clientPort / Set up Zookeeper (V 3.3.5) for Storm
    • server.id=host*port*port / Set up Zookeeper (V 3.3.5) for Storm
  • Zookeeper
    • about / A high-level view of various components of Storm
    • clean up script / Cleaning up Zookeeper
  • Zookeeper (v 3.3.5)
    • setting up / Set up Zookeeper (V 3.3.5) for Storm
  • Zookeeper configurations
    • about / Zookeeper configurations
    • dataDir=/usr/local/zookeeper/tmp / Zookeeper configurations
    • clientPort=2182 / Zookeeper configurations
    • maxClientCnxns=30l / Zookeeper configurations
lock icon The rest of the chapter is locked
arrow left Previous Chapter
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}