Storm Blueprints: Patterns for Distributed Real-time Computation

More Information
  • Learn the fundamentals of Storm
  • Install and configure storm in pseudo-distributed and fully-distributed mode
  • Familiarize yourself with the fundamentals of Trident and distributed state
  • Design patterns for data flows in a distributed system
  • Create integration patterns for persistence mechanisms such as Titan
  • Deploy and run Storm clusters by leveraging YARN
  • Achieve continuous availability and fault tolerance through distributed storage
  • Recognize centralized logging mechanisms and processing
  • Implement polyglot persistence and distributed transactions
  • Calculate the effectiveness of a campaign using click-through analysis

Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mechanism, making it the core of many big data platforms. Storm is essential if you want to deploy, operate, and develop data processing flows capable of processing billions of transactions.

"Storm: Distributed Real-time Computation Blueprints" covers a broad range of distributed computing topics, including not only design and integration patterns, but also domains and applications to which the technology is immediately useful and commonly applied. This book introduces you to Storm using real-world examples, beginning with simple Storm topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.

This book covers the domains of real-time log processing, sensor data analysis, collective and artificial intelligence, financial market analysis, Natural Language Processing (NLP), graph analysis, polyglot persistence and online advertising. While exploring distributed computing applications in each of those domains, the book covers advanced Storm topics such as Trident and Distributed State, as well as integration patterns for Druid and Titan. Simultaneously, the book also describes the deployment of Storm to YARN and the Amazon infrastructure, as well as other key operational concerns such as centralized logging.

By the end of the book, you will have gained an understanding of the fundamentals of Storm and Trident and be able to identify and apply those fundamentals to any suitable problem.

  • Process high-volume log files in real time while learning the fundamentals of Storm topologies and system deployment.
  • Deploy Storm on Hadoop (YARN) and understand how the systems complement each other for online advertising and trade processing.
  • Follow along as each chapter presents a new problem and the architectural pattern, design, and implementation of a solution.
Page Count 336
Course Length 10 hours 4 minutes
ISBN 9781782168294
Date Of Publication 26 Mar 2014


P. Taylor Goetz

P. Taylor Goetz is an Apache Storm committer and release manager and has been involved with the usage and development of Storm since it was first released as open source in October of 2011. As an active contributor to the Storm user community, Taylor leads a number of open source projects that enable enterprises to integrate Storm into heterogeneous infrastructure.

Presently, he works at Hortonworks where he leads the integration of Storm into Hortonworks Data Platform (HDP). Prior to joining Hortonworks, he worked at Health Market Science where he led the integration of Storm into HMS' next generation Master Data Management platform with technologies including Cassandra, Kafka, Elastic Search, and the Titan graph database.

Brian O'Neill

Brian O'Neill is a husband, hacker, hiker, and kayaker. He is a fisherman and father as well as big data believer, innovator, and distributed computing dreamer.

He has been a technology leader for over 15 years and is recognized as an authority on big data. He has experience as an architect in a wide variety of settings, from start-ups to Fortune 500 companies. He believes in open source and contributes to numerous projects. He leads projects that extend Cassandra and integrate the database with indexing engines, distributed processing frameworks, and analytics engines. He won InfoWorld's Technology Leadership award in 2013. He authored the Dzone reference card on  Cassandra and was selected as a Datastax Cassandra MVP in 2012 and 2013.

In the past, he has contributed to expert groups within the Java Community Process (JCP) and has patents in artificial intelligence and context-based discovery. He is proud to hold a B.S. in Computer Science from Brown University.

Presently, Brian is Chief Technology Officer for Health Market Science (HMS), where he heads the development of their big data platform focused on data management and analysis for the healthcare space. The platform is powered by Storm and Cassandra and delivers real-time data management and analytics as a service.