Free Sample
+ Collection

Storm Blueprints: Patterns for Distributed Real-time Computation

Blueprints
P. Taylor Goetz, Brian O'Neill

One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals.
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook

Want this title & more?

$16.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782168294
Paperback336 pages

About This Book

  • Process high-volume log files in real time while learning the fundamentals of Storm topologies and system deployment.
  • Deploy Storm on Hadoop (YARN) and understand how the systems complement each other for online advertising and trade processing.
  • Follow along as each chapter presents a new problem and the architectural pattern, design, and implementation of a solution.

Who This Book Is For

Although the book focuses primarily on Java development with Storm, the patterns are more broadly applicable and the tips, techniques, and approaches described in the book apply to architects, developers, and operations.

Additionally, the book should provoke and inspire applications of distributed computing to other industries and domains. Hadoop enthusiasts will also find this book a good introduction to Storm, providing a potential migration path from batch processing to the world of real-time analytics.

Table of Contents

Chapter 1: Distributed Word Count
Introducing elements of a Storm topology – streams, spouts, and bolts
Introducing the word count topology data flow
Implementing the word count topology
Introducing parallelism in Storm
Understanding stream groupings
Guaranteed processing
Summary
Chapter 2: Configuring Storm Clusters
Introducing the anatomy of a Storm cluster
Introducing the Storm technology stack
Installing Storm on Linux
Submitting topologies to a Storm cluster
Automating the cluster configuration
A rapid introduction to Puppet
Summary
Chapter 3: Trident Topologies and Sensor Data
Examining our use case
Introducing Trident topologies
Introducing Trident spouts
Introducing Trident operations – filters and functions
Introducing Trident aggregators – Combiners and Reducers
Introducing the Trident state
Executing the topology
Summary
Chapter 4: Real-time Trend Analysis
Use case
Architecture
Installing the required software
Introducing the sample application
Introducing the log analysis topology
The final topology
Running the log analysis topology
Summary
Chapter 5: Real-time Graph Analysis
Use case
Architecture
A brief introduction to graph databases
Software installation
Setting up Titan to use the Cassandra storage backend
Graph data model
Connecting to the Twitter stream
Twitter graph topology
Implementing GraphState
Implementing GraphFactory
Implementing GraphTupleProcessor
Putting it all together – the TwitterGraphTopology class
Querying the graph with Gremlin
Summary
Chapter 6: Artificial Intelligence
Designing for our use case
Establishing the architecture
Implementing the architecture
Summary
Chapter 7: Integrating Druid for Financial Analytics
Use case
Integrating a non-transactional system
The topology
Implementing the architecture
Executing the implementation
Examining the analytics
Summary
Chapter 8: Natural Language Processing
Motivating a Lambda architecture
Examining our use case
Realizing a Lambda architecture
Designing the topology for our use case
Implementing the design
Examining the analytics
Batch processing / historical analysis
Hadoop
Summary
Chapter 9: Deploying Storm on Hadoop for Advertising Analysis
Examining the use case
Establishing the architecture
Configuring the infrastructure
Deploying the analytics
Performing the analytics
Deploying the topology
Executing the topology
Summary
Chapter 10: Storm in the Cloud
Introducing Amazon Elastic Compute Cloud (EC2)
Introducing Apache Whirr
Configuring a Storm cluster with Whirr
Introducing Whirr Storm
Introducing Vagrant
Creating Storm-provisioning scripts
Summary

What You Will Learn

  • Learn the fundamentals of Storm
  • Install and configure storm in pseudo-distributed and fully-distributed mode
  • Familiarize yourself with the fundamentals of Trident and distributed state
  • Design patterns for data flows in a distributed system
  • Create integration patterns for persistence mechanisms such as Titan
  • Deploy and run Storm clusters by leveraging YARN
  • Achieve continuous availability and fault tolerance through distributed storage
  • Recognize centralized logging mechanisms and processing
  • Implement polyglot persistence and distributed transactions
  • Calculate the effectiveness of a campaign using click-through analysis

In Detail

Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mechanism, making it the core of many big data platforms. Storm is essential if you want to deploy, operate, and develop data processing flows capable of processing billions of transactions.

"Storm: Distributed Real-time Computation Blueprints" covers a broad range of distributed computing topics, including not only design and integration patterns, but also domains and applications to which the technology is immediately useful and commonly applied. This book introduces you to Storm using real-world examples, beginning with simple Storm topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.

This book covers the domains of real-time log processing, sensor data analysis, collective and artificial intelligence, financial market analysis, Natural Language Processing (NLP), graph analysis, polyglot persistence and online advertising. While exploring distributed computing applications in each of those domains, the book covers advanced Storm topics such as Trident and Distributed State, as well as integration patterns for Druid and Titan. Simultaneously, the book also describes the deployment of Storm to YARN and the Amazon infrastructure, as well as other key operational concerns such as centralized logging.

By the end of the book, you will have gained an understanding of the fundamentals of Storm and Trident and be able to identify and apply those fundamentals to any suitable problem.

Authors

Read More