Storm Real-time Processing Cookbook

Java developers can expand into real-time data processing with this fantastic guide to Storm. Using a cookbook approach with lots of practical recipes, it’s the user-friendly way to learn how to process unlimited data streams.

Storm Real-time Processing Cookbook

Quinton Anderson

Java developers can expand into real-time data processing with this fantastic guide to Storm. Using a cookbook approach with lots of practical recipes, it’s the user-friendly way to learn how to process unlimited data streams.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$21.00
RRP $29.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$21.00
$49.99
$29.99p/m after trial
RRP $29.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781782164425
Paperback254 pages

Book Description

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.

The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.

Table of Contents

Chapter 1: Setting Up Your Development Environment
Introduction
Setting up your development environment
Distributed version control
Creating a "Hello World" topology
Creating a Storm cluster – provisioning the machines
Creating a Storm cluster – provisioning Storm
Deriving basic click statistics
Unit testing a bolt
Implementing an integration test
Deploying to the cluster
Chapter 2: Log Stream Processing
Introduction
Creating a log agent
Creating the log spout
Rule-based analysis of the log stream
Indexing and persisting the log data
Counting and persisting log statistics
Creating an integration test for the log stream cluster
Creating a log analytics dashboard
Chapter 3: Calculating Term Importance with Trident
Introduction
Creating a URL stream using a Twitter filter
Deriving a clean stream of terms from the documents
Calculating the relative importance of each term
Chapter 4: Distributed Remote Procedure Calls
Introduction
Using DRPC to complete the required processing
Integration testing of a Trident topology
Implementing a rolling window topology
Simulating time in integration testing
Chapter 5: Polyglot Topology
Introduction
Implementing the multilang protocol in Qt
Implementing the SplitSentence bolt in Qt
Implementing the count bolt in Ruby
Defining the word count topology in Clojure
Chapter 6: Integrating Storm and Hadoop
Introduction
Implementing TF-IDF in Hadoop
Persisting documents from Storm
Integrating the batch and real-time views
Chapter 7: Real-time Machine Learning
Introduction
Implementing a transactional topology
Creating a Random Forest classification model using R
Operational classification of transactional streams using Random Forest
Creating an association rules model in R
Creating a recommendation engine
Real-time online machine learning
Chapter 8: Continuous Delivery
Introduction
Setting up a CI server
Setting up system environments
Defining a delivery pipeline
Implementing automated acceptance testing
Chapter 9: Storm on AWS
Introduction
Deploying Storm on AWS using Pallet
Setting up a Virtual Private Cloud
Deploying Storm into Virtual Private Cloud using Vagrant

What You Will Learn

  • Create a log spout
  • Consume messages from a JMS queue
  • Implement unidirectional synchronization based on a data stream
  • Execute disaster recovery on a separate AWS region

Authors

Table of Contents

Chapter 1: Setting Up Your Development Environment
Introduction
Setting up your development environment
Distributed version control
Creating a "Hello World" topology
Creating a Storm cluster – provisioning the machines
Creating a Storm cluster – provisioning Storm
Deriving basic click statistics
Unit testing a bolt
Implementing an integration test
Deploying to the cluster
Chapter 2: Log Stream Processing
Introduction
Creating a log agent
Creating the log spout
Rule-based analysis of the log stream
Indexing and persisting the log data
Counting and persisting log statistics
Creating an integration test for the log stream cluster
Creating a log analytics dashboard
Chapter 3: Calculating Term Importance with Trident
Introduction
Creating a URL stream using a Twitter filter
Deriving a clean stream of terms from the documents
Calculating the relative importance of each term
Chapter 4: Distributed Remote Procedure Calls
Introduction
Using DRPC to complete the required processing
Integration testing of a Trident topology
Implementing a rolling window topology
Simulating time in integration testing
Chapter 5: Polyglot Topology
Introduction
Implementing the multilang protocol in Qt
Implementing the SplitSentence bolt in Qt
Implementing the count bolt in Ruby
Defining the word count topology in Clojure
Chapter 6: Integrating Storm and Hadoop
Introduction
Implementing TF-IDF in Hadoop
Persisting documents from Storm
Integrating the batch and real-time views
Chapter 7: Real-time Machine Learning
Introduction
Implementing a transactional topology
Creating a Random Forest classification model using R
Operational classification of transactional streams using Random Forest
Creating an association rules model in R
Creating a recommendation engine
Real-time online machine learning
Chapter 8: Continuous Delivery
Introduction
Setting up a CI server
Setting up system environments
Defining a delivery pipeline
Implementing automated acceptance testing
Chapter 9: Storm on AWS
Introduction
Deploying Storm on AWS using Pallet
Setting up a Virtual Private Cloud
Deploying Storm into Virtual Private Cloud using Vagrant

Book Details

ISBN 139781782164425
Paperback254 pages
Read More

Read More Reviews

Recommended for You

Storm Blueprints: Patterns for Distributed Real-time Computation Book Cover
Storm Blueprints: Patterns for Distributed Real-time Computation
$ 29.99
$ 21.00
Fast Data Processing with Spark Book Cover
Fast Data Processing with Spark
$ 22.99
$ 16.10
Hadoop Real-World Solutions Cookbook Book Cover
Hadoop Real-World Solutions Cookbook
$ 29.99
$ 21.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00