Free Sample
+ Collection

Storm Real-time Processing Cookbook

Cookbook
Quinton Anderson

Java developers can expand into real-time data processing with this fantastic guide to Storm. Using a cookbook approach with lots of practical recipes, it’s the user-friendly way to learn how to process unlimited data streams.
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782164425
Paperback254 pages

About This Book

  • Learn the key concepts of processing data in real time with Storm
  • Concepts ranging from Log stream processing to mastering data management with Storm
  • Written in a Cookbook style, with plenty of practical recipes with well-explained code examples and relevant screenshots and diagrams

Who This Book Is For

If you are a Java developer with basic knowledge of real-time processing and would like to learn Storm to process unbounded streams of data in real time, then this book is for you.

Table of Contents

Chapter 1: Setting Up Your Development Environment
Introduction
Setting up your development environment
Distributed version control
Creating a "Hello World" topology
Creating a Storm cluster – provisioning the machines
Creating a Storm cluster – provisioning Storm
Deriving basic click statistics
Unit testing a bolt
Implementing an integration test
Deploying to the cluster
Chapter 2: Log Stream Processing
Introduction
Creating a log agent
Creating the log spout
Rule-based analysis of the log stream
Indexing and persisting the log data
Counting and persisting log statistics
Creating an integration test for the log stream cluster
Creating a log analytics dashboard
Chapter 3: Calculating Term Importance with Trident
Introduction
Creating a URL stream using a Twitter filter
Deriving a clean stream of terms from the documents
Calculating the relative importance of each term
Chapter 4: Distributed Remote Procedure Calls
Introduction
Using DRPC to complete the required processing
Integration testing of a Trident topology
Implementing a rolling window topology
Simulating time in integration testing
Chapter 5: Polyglot Topology
Introduction
Implementing the multilang protocol in Qt
Implementing the SplitSentence bolt in Qt
Implementing the count bolt in Ruby
Defining the word count topology in Clojure
Chapter 6: Integrating Storm and Hadoop
Introduction
Implementing TF-IDF in Hadoop
Persisting documents from Storm
Integrating the batch and real-time views
Chapter 7: Real-time Machine Learning
Introduction
Implementing a transactional topology
Creating a Random Forest classification model using R
Operational classification of transactional streams using Random Forest
Creating an association rules model in R
Creating a recommendation engine
Real-time online machine learning
Chapter 8: Continuous Delivery
Introduction
Setting up a CI server
Setting up system environments
Defining a delivery pipeline
Implementing automated acceptance testing
Chapter 9: Storm on AWS
Introduction
Deploying Storm on AWS using Pallet
Setting up a Virtual Private Cloud
Deploying Storm into Virtual Private Cloud using Vagrant

What You Will Learn

  • Create a log spout
  • Consume messages from a JMS queue
  • Implement unidirectional synchronization based on a data stream
  • Execute disaster recovery on a separate AWS region

In Detail

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.

The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.

Authors

Read More