Free Sample
+ Collection

Learning Storm

Ankit Jain, Anand Nalya

Create real-time stream processing applications with Apache Storm
RRP $23.99
RRP $39.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783981328
Paperback252 pages

About This Book

  • Integrate Storm with other Big Data technologies like Hadoop, HBase, and Apache Kafka
  • Explore log processing and machine learning using Storm
  • Step-by-step and easy-to-understand guide to effortlessly create applications with Storm

Who This Book Is For

If you are a Java developer who wants to enter into the world of real-time stream processing applications using Apache Storm, then this book is for you. No previous experience in Storm is required as this book starts from the basics. After finishing this book, you will be able to develop not-so-complex Storm applications.

Table of Contents

Chapter 1: Setting Up Storm on a Single Machine
Features of Storm
Storm components
The Storm data model
Chapter 2: Setting Up a Storm Cluster
Setting up a distributed Storm cluster
Deploying a topology on a remote Storm cluster
Configuring the parallelism of a topology
Rebalancing the parallelism of a topology
Stream grouping
Guaranteed message processing
Chapter 3: Monitoring the Storm Cluster
Starting to use the Storm UI
Monitoring a topology using the Storm UI
Cluster statistics using the Nimbus thrift client
Chapter 4: Storm and Kafka Integration
The Kafka architecture
Setting up Kafka
A sample Kafka producer
Integrating Kafka with Storm
Chapter 5: Exploring High-level Abstraction in Storm with Trident
Introducing Trident
Understanding Trident's data model
Writing Trident functions, filters, and projections
Trident repartitioning operations
Trident aggregators
Utilizing the groupBy operation
A non-transactional topology
A sample Trident topology
Maintaining the topology state with Trident
A transactional topology
The opaque transactional topology
Distributed RPC
When to use Trident
Chapter 6: Integration of Storm with Batch Processing Tools
Exploring Apache Hadoop
Installing Apache Hadoop
Integration of Storm with Hadoop
Deploying Storm-Starter topologies on Storm-YARN
Chapter 7: Integrating Storm with JMX, Ganglia, HBase, and Redis
Monitoring the Storm cluster using JMX
Monitoring the Storm cluster using Ganglia
Integrating Storm with HBase
Integrating Storm with Redis
Chapter 8: Log Processing with Storm
Server log-processing elements
Producing the Apache log in Kafka
Splitting the server log line
Identifying the country, the operating system type, and the browser type from the logfile
Extracting the searched keyword
Persisting the process data
Defining a topology and the Kafka spout
Deploying a topology
MySQL queries
Chapter 9: Machine Learning
Exploring machine learning
Using Trident-ML
The use case – clustering synthetic control data
Producing a training dataset into Kafka
Building a Trident topology to build the clustering model

What You Will Learn

  • Learn the core concepts of Apache Storm and real-time processing
  • Deploy Storm in the local and clustered modes
  • Design and develop Storm topologies to solve real-world problems
  • Read data from external sources such as Apache Kafka for processing in Storm and store the output into HBase and Redis
  • Create Trident topologies to support various message-processing semantics
  • Monitor the health of a Storm cluster

In Detail

Starting with the very basics of Storm, you will learn how to set up Storm on a single machine and move on to deploying Storm on your cluster. You will understand how Kafka can be integrated with Storm using the Kafka spout.

You will then proceed to explore the Trident abstraction tool with Storm to perform stateful stream processing, guaranteeing single message processing in every topology. You will move ahead to learn how to integrate Hadoop with Storm. Next, you will learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, and Kafka to realize the full potential of Storm.

Finally, you will perform in-depth case studies on Apache log processing and machine learning with a focus on Storm, and through these case studies, you will discover Storm's realm of possibilities.


Read More