Reader small image

You're reading from  Real-time Analytics with Storm and Cassandra

Product typeBook
Published inMar 2015
Reading LevelIntermediate
Publisher
ISBN-139781784395490
Edition1st Edition
Languages
Right arrow
Author (1)
Shilpi Saxena
Shilpi Saxena
author image
Shilpi Saxena

Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers. Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS. Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.
Read more about Shilpi Saxena

Right arrow

Chapter 3. Understanding Storm Internals by Examples

This chapter of the book is dedicated to making you understand the internals of Storm and how it works using practical examples. The intent is to get you accustomed to writing you own spouts, go through reliable and non-reliable topologies, and acquaint you with various groupings provided by the Storm.

The topics that will be covered in the chapter are as follows:

  • Storm spouts and custom spouts

  • Anchoring and acking

  • Different stream groupings

By the end of this chapter, you should be able to understand the various groupings and the concept of reliability by using of anchoring, and you will be able to create your own spouts.

Customizing Storm spouts


You have explored and understood WordCount topology provided by the Storm-starter project in previous chapters. Now it's time we move on to the next step, the do it yourself journey with Storm; so let's take up the next leap and do some exciting stuff with our own spouts that read from various sources.

Creating FileSpout

Here we will create our own spout to read the events or tuples from a file source and emit them into the topology; we would substitute spout in place of RandomSentenceSpout we used in the WordCount topology in the previous chapter.

To start, copy the project we created in Chapter 2, Getting Started with Your First Topology, into a new project and make the following changes in RandomSentenceSpout to make a new class called FileSpout within the Storm-starter project.

Now we will make changes in FileSpout so that it reads sentences from a file as shown in the following code:

public class FileSpout extends BaseRichSpout {
  //declaration section
  SpoutOutputCollector...

Anchoring and acking


We have talked about DAG that is created for the execution of a Storm topology. Now when you are designing your topologies to cater to reliability, there are two items that needs to be added to Storm:

  • Whenever a new link, that is, a new stream is being added to the DAG, it is called anchoring

  • When the tuple is processed in entirety, it is called acking

When Storm knows these preceding facts, then during the processing of tuples it can gauge them and accordingly fail or acknowledge the tuples depending upon whether they are completely processed or not.

Let's take a look at the following WordCount topology bolts to understand the Storm API anchoring and acking better:

  • SplitSentenceBolt: The purpose of this bolt was to split the sentence into different words and emit it. Now let's examine the output declarer and the execute methods of this bolt in detail (specially the highlighted sections) as shown in the following code:

      public void execute(Tuple tuple) {
          String sentence...

Stream groupings


Next we need to get acquainted with various stream groupings (a stream grouping is basically the mechanism that defines how Storm partitions and distributes the streams of tuples amongst tasks of bolts) provided by Storm. Streams are the basic wiring component of a Storm topology, and understanding them provides a lot of flexibility to the developer to handle various problems in programs efficiently.

Local or shuffle grouping

Local or shuffle grouping is the most common grouping that randomly distributes the tuples emitted by the source ensuring equal distribution, that is, each instance of the bolt gets to process the same number of events. Load balancing is automatically taken care of by this grouping.

Due to the random nature of distribution of this grouping, it's useful only for atomic operations by specifying a single parameter—source of stream. The following snippet is from WordCount topology (which we reated earlier), which demonstrates the usage of shuffle grouping...

Quiz time


Q.1 State whether the following statements are true or false:

  1. All components of reliable topologies use anchoring.

  2. In the event of a failure, all the tuples are played back again.

  3. Shuffle grouping does load balancing.

  4. Global grouping is like a broadcaster.

Q.2 Fill in the blanks:

  1. _______________ is the method to tell the framework that the tuple has been successfully processed.

  2. The _______________ method specifies the name of the stream.

  3. The ___________ method is used to push the tuple downstream in the DAG.

Make changes to WordCount topology of the Storm-starter project to create a custom grouping so that all words starting from a particular letter always go to same instance of the WordCount bolt.

Summary


In this chapter, we have understood the intricacies of the Storm spout. We also created a custom file spout and integrated it with WordCount topology. We also introduced you to the concepts of reliability, acking, and anchoring. The knowledge of various groupings provided by the current version of Storm further enhance the capabilities of a user to explore and experiment.

In the next chapter, we shall get you acquainted with the clustered setup of Storm as well as give you an insight on various monitoring tools of clustered mode.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Real-time Analytics with Storm and Cassandra
Published in: Mar 2015Publisher: ISBN-13: 9781784395490
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Shilpi Saxena

Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers. Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS. Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.
Read more about Shilpi Saxena