Packt+ | Advance your knowledge in tech

You're reading from Real-time Analytics with Storm and Cassandra

Product typeBook

Published inMar 2015

Reading LevelIntermediate

Publisher

ISBN-139781784395490

Edition1st Edition

Languages

Java

Tools

Storm Cassandra

Concepts

Data Processing

Author (1)

Shilpi Saxena

Chapter 3. Understanding Storm Internals by Examples

This chapter of the book is dedicated to making you understand the internals of Storm and how it works using practical examples. The intent is to get you accustomed to writing you own spouts, go through reliable and non-reliable topologies, and acquaint you with various groupings provided by the Storm.

The topics that will be covered in the chapter are as follows:

Storm spouts and custom spouts
Anchoring and acking
Different stream groupings

By the end of this chapter, you should be able to understand the various groupings and the concept of reliability by using of anchoring, and you will be able to create your own spouts.

Customizing Storm spouts

You have explored and understood WordCount topology provided by the Storm-starter project in previous chapters. Now it's time we move on to the next step, the do it yourself journey with Storm; so let's take up the next leap and do some exciting stuff with our own spouts that read from various sources.

Creating FileSpout

Here we will create our own spout to read the events or tuples from a file source and emit them into the topology; we would substitute spout in place of RandomSentenceSpout we used in the WordCount topology in the previous chapter.

To start, copy the project we created in Chapter 2, Getting Started with Your First Topology, into a new project and make the following changes in RandomSentenceSpout to make a new class called FileSpout within the Storm-starter project.

Now we will make changes in FileSpout so that it reads sentences from a file as shown in the following code:

public class FileSpout extends BaseRichSpout {
  //declaration section
  SpoutOutputCollector...

Anchoring and acking

We have talked about DAG that is created for the execution of a Storm topology. Now when you are designing your topologies to cater to reliability, there are two items that needs to be added to Storm:

Whenever a new link, that is, a new stream is being added to the DAG, it is called anchoring
When the tuple is processed in entirety, it is called acking

When Storm knows these preceding facts, then during the processing of tuples it can gauge them and accordingly fail or acknowledge the tuples depending upon whether they are completely processed or not.

Let's take a look at the following WordCount topology bolts to understand the Storm API anchoring and acking better:

SplitSentenceBolt: The purpose of this bolt was to split the sentence into different words and emit it. Now let's examine the output declarer and the execute methods of this bolt in detail (specially the highlighted sections) as shown in the following code:
```
  public void execute(Tuple tuple) {
      String sentence...
```

Stream groupings

Next we need to get acquainted with various stream groupings (a stream grouping is basically the mechanism that defines how Storm partitions and distributes the streams of tuples amongst tasks of bolts) provided by Storm. Streams are the basic wiring component of a Storm topology, and understanding them provides a lot of flexibility to the developer to handle various problems in programs efficiently.

Local or shuffle grouping

Local or shuffle grouping is the most common grouping that randomly distributes the tuples emitted by the source ensuring equal distribution, that is, each instance of the bolt gets to process the same number of events. Load balancing is automatically taken care of by this grouping.

Due to the random nature of distribution of this grouping, it's useful only for atomic operations by specifying a single parameter—source of stream. The following snippet is from WordCount topology (which we reated earlier), which demonstrates the usage of shuffle grouping...

Quiz time

Q.1 State whether the following statements are true or false:

All components of reliable topologies use anchoring.
In the event of a failure, all the tuples are played back again.
Shuffle grouping does load balancing.
Global grouping is like a broadcaster.

Q.2 Fill in the blanks:

_______________ is the method to tell the framework that the tuple has been successfully processed.
The _______________ method specifies the name of the stream.
The ___________ method is used to push the tuple downstream in the DAG.

Make changes to WordCount topology of the Storm-starter project to create a custom grouping so that all words starting from a particular letter always go to same instance of the WordCount bolt.

Summary

In this chapter, we have understood the intricacies of the Storm spout. We also created a custom file spout and integrated it with WordCount topology. We also introduced you to the concepts of reliability, acking, and anchoring. The knowledge of various groupings provided by the current version of Storm further enhance the capabilities of a user to explore and experiment.

In the next chapter, we shall get you acquainted with the clustered setup of Storm as well as give you an insight on various monitoring tools of clustered mode.

The rest of the chapter is locked

You have been reading a chapter from

Real-time Analytics with Storm and Cassandra

Published in: Mar 2015Publisher: ISBN-13: 9781784395490

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Shilpi Saxena

Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers. Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS. Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.
Read more about Shilpi Saxena

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages