Packt+ | Advance your knowledge in tech

You're reading from Fast Data Processing Systems with SMACK Stack

Product typeBook

Published inDec 2016

Reading LevelIntermediate

PublisherPackt

ISBN-139781786467201

Edition1st Edition

Languages

Scala

Tools

Mesos Apache Spark

Concepts

Data Processing

Author (1)

Raúl Estrada

Chapter 6. The Manager - Apache Mesos

If we stop to analyze our current point in the book and we look back, we now know that we have technologies that require more memory and CPU than traditional computers can offer. How is this achieved at low cost? The answer is Apache Mesos.

In addition to the explanation of the Apache Mesos architecture and principles, we explore how to run the principal frameworks and how to run Spark, Cassandra, and Kafka on Apache Mesos.

This chapter has the following sections:

The Apache Mesos architecture
Resource allocation
Running a Mesos cluster on AWS
Running a Mesos cluster on a private data center
Scheduling and management frameworks
Apache Spark on Apache Mesos
Apache Cassandra on Apache Mesos
Apache Kafka on Apache Mesos

The Apache Mesos architecture

Mesos is an open source platform for sharing the resources of commodity machines between different distributed applications (later we see they are called frameworks in the Mesos ecosystem), such as Spark, Cassandra, and Kafka among others. Mesos objective is to run as a centralized cluster manager that pools all the physical resources of each cluster member and makes them available as a single source of highly available resources for all the different applications.

Let's take a simple example, a startup has bought eight machines for its humble data center, each one has 8 CPUs and 64 GB of RAM, and previously had a four node cluster where each machine had 4 CPUs and 16 GB of RAM. With Apache Mesos, we can make a virtual cluster that emulates a single machine with (8*8 + 4*4) 80 CPUs and (8*64 + 4*16) 576 GB of RAM. So easily we can have at our fingertips the power of ancestral mainframes. On this cluster we can run multiple distributed applications. The sharing...

Resource allocation

Mesos has a resource allocation module that contains the policy Mesos master uses to determine the quantity of resource offers made to each framework. As developers, we can customize the module to implement our own allocation policy, for example, we can manipulate the priority and weight of resources, to meet the business requirements. We can also develop custom allocation modules.

One objective of the resource allocation module is to ensure fair resource distribution among the frameworks. The efficiency of a cluster manager lies in the choice of the correct sharing policy algorithm.

For example, Hadoop is governed by the max-min fairness allocation algorithm, in which resource requirements are distributed equitably among competitors. The effectiveness of this algorithm is proven in homogeneous environments. Unfortunately, fast data requires heterogeneous environments.

The distribution of resources between frameworks with heterogeneous demands for resources brings an interesting...

Running a Mesos cluster on AWS

For Amazon Web Services, Amazon has divided the world in to 11 physical regions and each can be accessed remotely. The services offered have usage-based pricing. And include several services such as EC2 (computing or processing), S3 (Storage), Dynamo DB (the Amazon Database), RDS, EBS, and so on.

AWS includes an EC2 trial to start developing on the platform. The free trial includes a machine with 700 MB of RAM for a year without cost. We need to pay if we need more power (more CPUs or more RAM), or if we want to use the S3 storage service. Prices are at https://aws.amazon.com/ec2/pricing/ .

Amazon account: To create an account we should go to http://aws.amazon.com and follow the instructions. The steps include phone and e-mail verification. The confirmation e-mail contains the account number needed in the following steps.
Key pairs: Amazon uses public-key authentication. We can choose the key pair from a drop-down list or we can create a new one when launching...

Running a Mesos cluster on a private data center

If we don't want to use cloud services from Amazon, Google, or Microsoft; we can set up our cluster on out private data center. Here we assume that our data center has four machines, and we are going to install a Mesos cluster on all. We also assume that the each machine has the same CentOS 6.6 Linux.

For this example, our machines' IP addresses are:

machine-1: 192.168.2.190
machine-2: 192.168.2.191
machine-3: 192.168.2.192
machine-4: 192.168.2.193

For convenience, we chose machine-1 as the Mesos cluster master and machine-2, machine-3, and machine-4 will be slaves.

Mesos installation

To install a multi node Mesos cluster on our data center we follow these steps:

Setting up the environment

As we already seen, we must download and install the libraries and dependencies to run Mesos on CentOS. We need to log onto the machine and run the following commands:

We need the wget command, to install it we type:

$ sudo yum install -y tar wget

Mesos >...

Scheduling and management frameworks

Here we mention some popular Mesos frameworks to deploy, discover, balance load, and handle failure of services. Unlike other application frameworks, these frameworks are used to service management.

In this context, we define two concepts as follows:

Load balancing: To ensure an equitable workload distribution among the instances.
Service discovery: To keep track of the instances on which a particular service is running

Some important Mesos frameworks are:

Marathon: Framework to launch and manage long-running applications
Chronos: A cluster scheduler
Apache Aurora: Framework to manage long-running services and cron jobs
Singularity: Platform-as-a-service (PaaS) for running services
Marathoner: Service discovery for Marathon
Consul: Framework for orchestration and service discovery
HAProxy: Framework used for load balancing
Bamboo: Framework to automatically configure HAProxies
NetflixFenzo: A task scheduler
Yelp's PaaSTA: A PaaS for running services.

In this section...

Apache Aurora

The Apache Aurora key features are as follows:

It is a Mesos framework for cron jobs, long-running services, and job management.
Conceived at Twitter Inc. and later open sourced under Apache license.
Keeps long-running jobs across a shared resources pool over a long duration. If one machine falls, Aurora reschedules jobs on other healthy machines.
Not recommended for systems with specific scheduling requirements since it is a scheduler itself.
Provides coarse grained resources for a specific job at any point of time.
Supports multiple users.
Its configuration is specified with a Domain Specific Language (DSL) to avoid configuration redundancy.

Aurora and Marathon offer similar feature sets, both are classified as service schedulers. There are three main differences:

Ease of use: Aurora is not easy to install. It exposes a thrift API, which means you'll need a thrift client to interact with it programmatically. On the other hand, Marathon helps you run Hello World as quickly as possible...

Singularity

Singularity key points are as follows:

Acts as an API and a web application
Conceived at HubSpot and later open sourced under Apache license
Used to launch and schedule long-running Mesos processes, scheduled jobs, and tasks
All its components can be considered as a PaaS to end users
Non-experimented users can use it to deploy tasks on Mesos without so much knowledge
Shares Mesos features such as fault tolerance, scalability, and resource allocation
Can run a task scheduler for other Mesos frameworks

Singularity installation

To install Singularity we need to have Docker installed, to do it follow the steps at:

https://docs.docker.com

Once installed, clone the singularity repository with this command:

$ git clone https://github.com/HubSpot/Singularity

Change the singularity directory:

$ cd Singularity

Now, use Docker Compose pull and up commands to test Singularity.

These commands set up the following in the container:

Mesos master and slave
Zookeeper
Singularity
Baragon service and Baragon...

Apache Spark on Apache Mesos

Here we explain in detail how to run Apache Spark on Mesos.

We have two options:

We need the Spark binary uploaded to a place accessible by Mesos and Spark driver configured to connect to Mesos
Install Spark in the same location in all the Mesos slaves and set Spark.Mesos.executor.home to point to this specific location

Follow these steps for the first option:

The first time that we run a Mesos task on the Mesos slave, this slave must have the Spark binary package to run the Executor in backend. The location accessible by Mesos could be HDFS, HTTP, or S3.

At the time of writing this book, Spark's version is 2.0.0. To download and upload it on HDFS we use the following command:

$ wget http://apache.mirrors.ionfish.org/spark/spark-2.0.0/spark- 
    2.0.0-bin-hadoop2.6.tgz
$ hadoop fs -put spark-2.0.0-bin-hadoop2.7.tgz /

In the Spark driver program, give the master URL as the Apache Mesos master URL in the form:

For a single Mesos master cluster: Mesos://master-host:5050...

Apache Cassandra on Apache Mesos

The easiest way to deploy Apache Cassandra on Apache Mesos is through Marathon. Mesosphere has already packaged the Cassandra executor and the necessary JAR files on tarball that can be submitted to Mesos with Marathon with the JSON code located at: https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.0-1/marathon.json

{ 
  "id": "/cassandra/dev-test", 
  "instances": 1, 
  "cpus": 0.5, 
  "mem": 512, 
  "ports": [ 
    0 
  ], 
  "uris": [ 
    "https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.0-1/cassandra-mesos-0.2.0-1.tar.gz", 
    "https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz" 
  ], 
  "env": { 
    "MESOS_ZK": "zk://localhost:2181/mesos", 
    "JAVA_OPTS": "-Xms256m -Xmx256m", 
    "CASSANDRA_CLUSTER_NAME": "dev-test", 
    "CASSANDRA_ZK": "zk://localhost:2181/cassandra-mesos", 
    "CASSANDRA_NODE_COUNT": "3", 
    "CASSANDRA_RESOURCE_CPU_CORES": "2.0", 
    "CASSANDRA_RESOURCE_MEM_MB": "2048", 
    "CASSANDRA_RESOURCE_DISK_MB...

Apache Kafka on Apache Mesos

Ensure that the following applications are available on the machine:

Java version 7 or later ( http://openjdk.java.net/install/ )
Gradle ( http://gradle.org/installation )

To download the Kafka on Mesos project from the repository:

$ git clone https://github.com/mesos/kafka
$ cd kafka 
$ ./gradlew jar

Use this command to download the Kafka executor:

$ wget https://archive.apache.org/dist/kafka/0.10.0.0/kafka_2.10-
    0.10.0.0.tgz

Set the following environment variable to point to the libmesos.so file:

$ export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so

Use the kafka-mesos.sh script to launch and configure Kafka on Mesos. But before, create the kafka-mesos.properties file with this contents:

storage=file:kafka-mesos.json  
master=zk://master:2181/mesos  
zk=master:2181  
api=http://master:7000

These properties are used to configure kafka-mesos.sh so we don't need to pass arguments to the scheduler all the time. The scheduler supports the following...

Summary

In this chapter, we have seen the Mesos architecture. We also reviewed how Mesos makes the resource allocation, the DRF algorithm, and reviewed how to run Mesos on AWS and on a private data center.

We visited the most important Mesos frameworks: Marathon, Chronos, Aurora, and Singularity. We also reviewed the Frameworks API.

In the last section, we reviewed how to run Spark, Cassandra, and Kafka on Apache Mesos, also covered topics such as the setup, configuration, and management of these frameworks on a distributed infrastructure using Mesos.

I hope that this chapter has armed you with all the resources that you require to effectively manage the complexities of today's modern DevOps.

The following chapters show study cases, considering Mesos as an infrastructure technology. The examples not focused on Mesos assume you are already running on it.

The rest of the chapter is locked

You have been reading a chapter from

Fast Data Processing Systems with SMACK Stack

Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages