Reader small image

You're reading from  Mastering Apache Storm

Product typeBook
Published inAug 2017
Reading LevelExpert
Publisher
ISBN-139781787125636
Edition1st Edition
Languages
Right arrow
Author (1)
Ankit Jain
Ankit Jain
author image
Ankit Jain

Ankit Jain holds a bachelor's degree in computer science and engineering. He has 6 years, experience in designing and architecting solutions for the big data domain and has been involved with several complex engagements. His technical strengths include Hadoop, Storm, S4, HBase, Hive, Sqoop, Flume, Elasticsearch, machine learning, Kafka, Spring, Java, and J2EE. He also shares his thoughts on his personal blog. You can follow him on Twitter at @mynameisanky. He spends most of his time reading books and playing with different technologies. When not at work, he spends time with his family and friends watching movies and playing games.
Read more about Ankit Jain

Right arrow

Chapter 6. Storm Scheduler

In the previous chapters, we covered the basics of Storm, the installation of Storm, the development and deployment of Storm, and the Trident topology in Storm clusters. In this chapter, we are focusing on Storm schedulers.

In this chapter, we are going to cover the following points:

  • Introduction to Storm schedulers
  • Default scheduler
  • Isolation scheduler
  • Resource-aware scheduler
  • Customer-aware scheduler

Introduction to Storm scheduler


As mentioned in the first two chapters, the Nimbus is responsible for deploying the topology and the supervisor is responsible for performing the computation tasks as defined in a Storm topology's spouts and bolts components. As we have shown, we can configure the number of worker slots for each supervisor node that are assigned to a topology as per the scheduler policy, as well as the number of workers allocated to a topology. In short, Storm schedulers help the Nimbus to decide the worker distribution of any given topology.

Default scheduler


The Storm default scheduler assigns component executors as evenly as possible between all the workers (supervisor slots) assigned to a given topology.

Let's consider a sample topology with one spout and one bolt, with both components having two executors. The following diagram shows the assignment of executors if we have submitted the topology by allocating two workers (supervisor slots):

As shown in the preceding diagram, each worker node contains one executor for a spout and one executor for a bolt. The even distribution of executors between workers is only possible if the number of executors in each component is divisible by the number of workers assigned to a topology.

Isolation scheduler


The isolation scheduler provides a mechanism for the easy and safe sharing of Storm cluster resources among many topologies. The isolation scheduler helps to allocate/reserve the dedicated sets of Storm nodes for topologies within the Storm cluster.

We need to define the following property in the Nimbus configuration file to switch to the isolation scheduler:

storm.scheduler: org.apache.storm.scheduler.IsolationScheduler 

We can allocate/reserve the resources for any topology by specifying the topology name and the number of nodes inside the isolation.scheduler.machines property, as mentioned in the following section. We need to define the isolation.scheduler.machines property in the Nimbus configuration, as Nimbus is responsible for the distribution of topology workers between Storm nodes:

isolation.scheduler.machines:  
  "Topology-Test1": 2 
  "Topology-Test2": 1 
  "Topology-Test3": 4 

In the preceding configuration, two nodes are assigned to Topology-Test1, one node is...

Resource-aware scheduler


A resource-aware scheduler helps users specify the amount of resources required for a single instance of any component (spout or bolt). We can enable the resource-aware scheduler by specifying the following property in the storm.yaml file:

storm.scheduler: "org.apache.storm.scheduler.resource.ResourceAwareScheduler" 

Component-level configuration

You can allocate the memory requirement to any component. Here are the methods available to allocate the memory to a single instance of any component:

public T setMemoryLoad(Number onHeap, Number offHeap) 

Alternatively, you can use the following:

public T setMemoryLoad(Number onHeap) 

The following is the definition of each argument:

  • onHeap: The amount of on heap space an instance of this component will consume in megabytes
  • offHeap: The amount of off heap space an instance of this component will consume in megabytes

The data type of both onHeap and offHeap is Number, and the default value is 0.0.

Memory usage example

Let's consider...

Custom scheduler


In Storm, Nimbus uses a scheduler to assign tasks to the supervisors. The default scheduler aims to allocate computing resources evenly to topologies. It works well in terms of fairness among topologies, but it is impossible for users to predict the placement of topology components in the Storm cluster, regarding which component of a topology needs to be assigned to which supervisor node.

Let's consider an example. Say that we have a topology that has one spout and two bolts, and each of the components has one executor and one task. The following diagram shows the distribution of the topology if we submit the topology to a Storm cluster. Assume that the number of workers assigned to the topology is three and the number of supervisors in the Storm cluster is three:

Let's assume that the last bolt in our topology, Bolt2, needs to process some data using a GPU rather than the CPU, and there's only one of the supervisors with a GPU. We need to write our own custom scheduler to...

Summary


In this chapter, we learned about the built-in Storm scheduler and also covered how we can write and configure a custom scheduler.

In the next chapter, we will be covering the monitoring of a Storm cluster using Graphite and Ganglia.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Apache Storm
Published in: Aug 2017Publisher: ISBN-13: 9781787125636
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ankit Jain

Ankit Jain holds a bachelor's degree in computer science and engineering. He has 6 years, experience in designing and architecting solutions for the big data domain and has been involved with several complex engagements. His technical strengths include Hadoop, Storm, S4, HBase, Hive, Sqoop, Flume, Elasticsearch, machine learning, Kafka, Spring, Java, and J2EE. He also shares his thoughts on his personal blog. You can follow him on Twitter at @mynameisanky. He spends most of his time reading books and playing with different technologies. When not at work, he spends time with his family and friends watching movies and playing games.
Read more about Ankit Jain