Reader small image

You're reading from  Fast Data Processing Systems with SMACK Stack

Product typeBook
Published inDec 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781786467201
Edition1st Edition
Languages
Right arrow
Author (1)
Raúl Estrada
Raúl Estrada
author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Right arrow

Chapter 6.  The Manager - Apache Mesos

If we stop to analyze our current point in the book and we look back, we now know that we have technologies that require more memory and CPU than traditional computers can offer. How is this achieved at low cost? The answer is Apache Mesos.

In addition to the explanation of the Apache Mesos architecture and principles, we explore how to run the principal frameworks and how to run Spark, Cassandra, and Kafka on Apache Mesos.

This chapter has the following sections:

  • The Apache Mesos architecture
  • Resource allocation
  • Running a Mesos cluster on AWS
  • Running a Mesos cluster on a private data center
  • Scheduling and management frameworks
  • Apache Spark on Apache Mesos
  • Apache Cassandra on Apache Mesos
  • Apache Kafka on Apache Mesos

The Apache Mesos architecture


Mesos is an open source platform for sharing the resources of commodity machines between different distributed applications (later we see they are called frameworks in the Mesos ecosystem), such as Spark, Cassandra, and Kafka among others. Mesos objective is to run as a centralized cluster manager that pools all the physical resources of each cluster member and makes them available as a single source of highly available resources for all the different applications.

Let's take a simple example, a startup has bought eight machines for its humble data center, each one has 8 CPUs and 64 GB of RAM, and previously had a four node cluster where each machine had 4 CPUs and 16 GB of RAM. With Apache Mesos, we can make a virtual cluster that emulates a single machine with (8*8 + 4*4) 80 CPUs and (8*64 + 4*16) 576 GB of RAM. So easily we can have at our fingertips the power of ancestral mainframes. On this cluster we can run multiple distributed applications. The sharing...

Resource allocation


Mesos has a resource allocation module that contains the policy Mesos master uses to determine the quantity of resource offers made to each framework. As developers, we can customize the module to implement our own allocation policy, for example, we can manipulate the priority and weight of resources, to meet the business requirements. We can also develop custom allocation modules.

One objective of the resource allocation module is to ensure fair resource distribution among the frameworks. The efficiency of a cluster manager lies in the choice of the correct sharing policy algorithm.

For example, Hadoop is governed by the max-min fairness allocation algorithm, in which resource requirements are distributed equitably among competitors. The effectiveness of this algorithm is proven in homogeneous environments. Unfortunately, fast data requires heterogeneous environments.

The distribution of resources between frameworks with heterogeneous demands for resources brings an interesting...

Running a Mesos cluster on AWS


For Amazon Web Services, Amazon has divided the world in to 11 physical regions and each can be accessed remotely. The services offered have usage-based pricing. And include several services such as EC2 (computing or processing), S3 (Storage), Dynamo DB (the Amazon Database), RDS, EBS, and so on.

AWS includes an EC2 trial to start developing on the platform. The free trial includes a machine with 700 MB of RAM for a year without cost. We need to pay if we need more power (more CPUs or more RAM), or if we want to use the S3 storage service. Prices are at https://aws.amazon.com/ec2/pricing/ .

  • Amazon account: To create an account we should go to http://aws.amazon.com and follow the instructions. The steps include phone and e-mail verification. The confirmation e-mail contains the account number needed in the following steps.
  • Key pairs: Amazon uses public-key authentication. We can choose the key pair from a drop-down list or we can create a new one when launching...

Running a Mesos cluster on a private data center


If we don't want to use cloud services from Amazon, Google, or Microsoft; we can set up our cluster on out private data center. Here we assume that our data center has four machines, and we are going to install a Mesos cluster on all. We also assume that the each machine has the same CentOS 6.6 Linux.

For this example, our machines' IP addresses are:

machine-1: 192.168.2.190
machine-2: 192.168.2.191
machine-3: 192.168.2.192
machine-4: 192.168.2.193

For convenience, we chose machine-1 as the Mesos cluster master and machine-2, machine-3, and machine-4 will be slaves.

Mesos installation

To install a multi node Mesos cluster on our data center we follow these steps:

Setting up the environment

As we already seen, we must download and install the libraries and dependencies to run Mesos on CentOS. We need to log onto the machine and run the following commands:

  • We need the wget command, to install it we type:
$ sudo yum install -y tar wget 
  • Mesos >...

Scheduling and management frameworks


Here we mention some popular Mesos frameworks to deploy, discover, balance load, and handle failure of services. Unlike other application frameworks, these frameworks are used to service management.

In this context, we define two concepts as follows:

  • Load balancing: To ensure an equitable workload distribution among the instances.
  • Service discovery: To keep track of the instances on which a particular service is running

Some important Mesos frameworks are:

  • Marathon: Framework to launch and manage long-running applications
  • Chronos: A cluster scheduler
  • Apache Aurora: Framework to manage long-running services and cron jobs
  • Singularity: Platform-as-a-service (PaaS) for running services
  • Marathoner: Service discovery for Marathon
  • Consul: Framework for orchestration and service discovery
  • HAProxy: Framework used for load balancing
  • Bamboo: Framework to automatically configure HAProxies
  • NetflixFenzo: A task scheduler
  • Yelp's PaaSTA: A PaaS for running services.

In this section...

Apache Aurora


The Apache Aurora key features are as follows:

  • It is a Mesos framework for cron jobs, long-running services, and job management.
  • Conceived at Twitter Inc. and later open sourced under Apache license.
  • Keeps long-running jobs across a shared resources pool over a long duration. If one machine falls, Aurora reschedules jobs on other healthy machines.
  • Not recommended for systems with specific scheduling requirements since it is a scheduler itself.
  • Provides coarse grained resources for a specific job at any point of time.
  • Supports multiple users.
  • Its configuration is specified with a Domain Specific Language (DSL) to avoid configuration redundancy.

Aurora and Marathon offer similar feature sets, both are classified as service schedulers. There are three main differences:

  • Ease of use: Aurora is not easy to install. It exposes a thrift API, which means you'll need a thrift client to interact with it programmatically. On the other hand, Marathon helps you run Hello World as quickly as possible...

Singularity


Singularity key points are as follows:

  • Acts as an API and a web application
  • Conceived at HubSpot and later open sourced under Apache license
  • Used to launch and schedule long-running Mesos processes, scheduled jobs, and tasks
  • All its components can be considered as a PaaS to end users
  • Non-experimented users can use it to deploy tasks on Mesos without so much knowledge
  • Shares Mesos features such as fault tolerance, scalability, and resource allocation
  • Can run a task scheduler for other Mesos frameworks

Singularity installation

To install Singularity we need to have Docker installed, to do it follow the steps at:

https://docs.docker.com

Once installed, clone the singularity repository with this command:

$ git clone https://github.com/HubSpot/Singularity 

Change the singularity directory:

$ cd Singularity 

Now, use Docker Compose pull and up commands to test Singularity.

These commands set up the following in the container:

  • Mesos master and slave
  • Zookeeper
  • Singularity
  • Baragon service and Baragon...

Apache Spark on Apache Mesos


Here we explain in detail how to run Apache Spark on Mesos.

We have two options:

  • We need the Spark binary uploaded to a place accessible by Mesos and Spark driver configured to connect to Mesos
  • Install Spark in the same location in all the Mesos slaves and set Spark.Mesos.executor.home to point to this specific location

Follow these steps for the first option:

The first time that we run a Mesos task on the Mesos slave, this slave must have the Spark binary package to run the Executor in backend. The location accessible by Mesos could be HDFS, HTTP, or S3.

At the time of writing this book, Spark's version is 2.0.0. To download and upload it on HDFS we use the following command:

$ wget http://apache.mirrors.ionfish.org/spark/spark-2.0.0/spark- 
    2.0.0-bin-hadoop2.6.tgz
$ hadoop fs -put spark-2.0.0-bin-hadoop2.7.tgz /

In the Spark driver program, give the master URL as the Apache Mesos master URL in the form:

  • For a single Mesos master cluster: Mesos://master-host:5050...

Apache Cassandra on Apache Mesos


The easiest way to deploy Apache Cassandra on Apache Mesos is through Marathon. Mesosphere has already packaged the Cassandra executor and the necessary JAR files on tarball that can be submitted to Mesos with Marathon with the JSON code located at: https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.0-1/marathon.json

{ 
  "id": "/cassandra/dev-test", 
  "instances": 1, 
  "cpus": 0.5, 
  "mem": 512, 
  "ports": [ 
    0 
  ], 
  "uris": [ 
    "https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.0-1/cassandra-mesos-0.2.0-1.tar.gz", 
    "https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz" 
  ], 
  "env": { 
    "MESOS_ZK": "zk://localhost:2181/mesos", 
    "JAVA_OPTS": "-Xms256m -Xmx256m", 
    "CASSANDRA_CLUSTER_NAME": "dev-test", 
    "CASSANDRA_ZK": "zk://localhost:2181/cassandra-mesos", 
    "CASSANDRA_NODE_COUNT": "3", 
    "CASSANDRA_RESOURCE_CPU_CORES": "2.0", 
    "CASSANDRA_RESOURCE_MEM_MB": "2048", 
    "CASSANDRA_RESOURCE_DISK_MB...

Apache Kafka on Apache Mesos


Ensure that the following applications are available on the machine:

To download the Kafka on Mesos project from the repository:

$ git clone https://github.com/mesos/kafka
$ cd kafka 
$ ./gradlew jar 

Use this command to download the Kafka executor:

$ wget https://archive.apache.org/dist/kafka/0.10.0.0/kafka_2.10-
    0.10.0.0.tgz 

Set the following environment variable to point to the libmesos.so file:

$ export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so 

Use the kafka-mesos.sh script to launch and configure Kafka on Mesos. But before, create the kafka-mesos.properties file with this contents:

storage=file:kafka-mesos.json  
master=zk://master:2181/mesos  
zk=master:2181  
api=http://master:7000  

These properties are used to configure kafka-mesos.sh so we don't need to pass arguments to the scheduler all the time. The scheduler supports the following...

Summary


In this chapter, we have seen the Mesos architecture. We also reviewed how Mesos makes the resource allocation, the DRF algorithm, and reviewed how to run Mesos on AWS and on a private data center.

We visited the most important Mesos frameworks: Marathon, Chronos, Aurora, and Singularity. We also reviewed the Frameworks API.

In the last section, we reviewed how to run Spark, Cassandra, and Kafka on Apache Mesos, also covered topics such as the setup, configuration, and management of these frameworks on a distributed infrastructure using Mesos.

I hope that this chapter has armed you with all the resources that you require to effectively manage the complexities of today's modern DevOps.

The following chapters show study cases, considering Mesos as an infrastructure technology. The examples not focused on Mesos assume you are already running on it.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fast Data Processing Systems with SMACK Stack
Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada