This chapter introduces several Mesos-based scheduling and management frameworks or applications that are required for the easy deployment, discovery, load balancing, and failure handling of long-running services. These so-called metaframeworks take care of the housekeeping activities of other frameworks and applications, such as service discovery (that is, keeping track of the instances on which a particular service is running) and load balancing (ensuring an equitable workload distribution among the instances), apart from configuration management, automated job scheduling, application scaling, and failure handling. The frameworks that we'll explore here include:
You're reading from Mastering Mesos
Marathon is a commonly used Mesos framework for long-running applications. It can be considered a replacement for init
or upstart
in traditional systems or as the init.d
of your system.
Marathon has many features, such as controlling a high availability environment, checking the applications' health, and so on. It also comes with Representational State Transfer (REST), such as endpoint, which you can use to start, stop, and scale your applications. It can be used to scale up and down the cluster based on the load, which means that it should be able to start a new instance just in case an available one goes down. Marathon is also designed to run other frameworks on it, such as Hadoop, Kafka, Storm, Chronos, and so on. Marathon makes sure that every application that is started through it keeps running even if a slave node goes down.
Marathon runs in a highly available fashion, which implies that there can be multiple schedulers...
To set this up, a high availability Mesos cluster needs to be set up, which will be explained in detail in Chapter 5, Mesos Cluster Deployment. For the time being, we assume that you already have a high availability Mesos cluster up and running. We'll now take a look at how to install Marathon on all the master machines in the cluster.
Log in to all the Mesos master machines and type in the following commands to set up Marathon.
On Debain/Ubuntu machines, run the following command:
# Update the repositories # Setup $ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF $ DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]') $ CODENAME=$(lsb_release -cs) # Add the repository $ echo "deb http://repos.mesosphere.com/${DISTRO} ${CODENAME} main" | \ sudo tee /etc/apt/sources.list.d/mesosphere.list $ sudo apt-get update # Install Marathon $ sudo apt-get -y install marathon
On RedHat/CentOS machines, execute the following command:
$ sudo...
One can consider Chronos as a time-based job scheduler, such as cron in the typical Unix environment. Chronos is distributed and fully fault-tolerant, and it runs on top of Apache Mesos.
Just like cron, Chronos executes the shell scripts (combined with Linux commands) by default and also supports Mesos executors.
Chronos can interact with systems such as Hadoop or Kafka even if the Mesos worker machine, on which the real execution happens, does not have the system installed. You can use Chronos to start a service or run a script on a remote machine in the background. The wrapper script can have an asynchronous callback to alert Chronos to the job status, such as whether it is completed or failed and so on. For the most part, people use Chronos to run dockerized applications. A detailed explanation of dockerized applications is provided in Chapter 7, Mesos Containerizers.
Chronos comes with a Web UI in which you can see the job status, statistics of the job's history...
The combination of Chronos and Marathon can be utilized as building blocks to create production-ready distributed applications. You already know that Chronos can be used to fire up tasks at scheduled intervals; cron and Marathon let your jobs run continuously, such as init
or upstart
, in typical Linux environments. As mentioned before, both the schedulers come with a REST endpoint that allows the user to manage the jobs. You can use this endpoint to start, manage, and terminate the running jobs. We will now take a look at how this is achieved.
As mentioned before, you can communicate with Chronos using the REST JSON API over HTTP. By default, those nodes that have Chronos up and running listen at the 8080
port for API requests. This section covers how to perform the following tasks using the REST endpoint:
Listing the running jobs
Manually starting a job
Adding a scheduled job
Deleting a job
For more information, visit http://mesos.github.io/chronos...
Apache Aurora is a powerful Mesos framework for long-running services, cron jobs, and ad hoc jobs. It was originally designed at Twitter and was later open sourced under the Apache license. You can turn your Mesos cluster to a private cloud using Aurora. Unlike Marathon, Aurora is responsible for keeping jobs running across a shared pool of resources over a long duration. If any of the machines in the pool fails, then Aurora can intelligently reschedule those jobs on other healthy machines in the pool.
Aurora is not useful if you try to build an application with specific requirements for scheduling or if the job itself is a scheduler.
Managing long-running applications is one of the key features of Aurora. Apart from this, Aurora can be used to provide coarse-grained (that is, fixed) resources for your job so that at any point of time, the job always has a specified amount of resources. It also supports multiple users, and the configuration is templated with DSL...
Singularity was originally designed at HubSpot and later open sourced under the Apache license. Singularity acts as an API and web application that can be used to launch and schedule long-running Mesos processes, scheduled jobs, and tasks. One can consider Singularity and the components that come with it as a PaaS (Platform as a Service) to the end users. A novice user can use Singularity to deploy tasks on Mesos without having to understand Mesos in detail.
Singularity takes advantages of Apache Mesos features such as fault tolerance, scalability, and resource allocation, and runs as a task scheduler for Mesos frameworks.
Before installing Singularity, make sure you have Docker installed on your machine. If you haven't installed it yet, you can do so by following the steps mentioned in the official website at https://docs.docker.com.
The first step is to clone the Singularity repository, which can be done as follows:
$ git clone https://github...
Modern distributed applications require a way to communicate with each other, which means that one application should know the presence of the other application when they are on the same network. This is called service discovery. In this section, we will take a look at the service discovery of web services that run on Marathon. One can adopt this approach for most of the stateless applications running on top of Marathon.
We will use the combination of the popular HAProxy TCP/HTTP load balancer along with Marathon's REST API script, which was covered in the previous topics, to regenerate the configuration file of HAProxy for the service discovery of Marathon applications. When a task is spawned on one of the Mesos slaves, they are configured to bind the port to an arbitrary one within the default range of 31,000-32,000.
Service discovery lets the applications running on Marathon communicate with others running alongside Marathon through their configured Marathon...
Mesos-consul is used to register and deregister services that run as Mesos tasks.
For example, if you have a Mesos task called myapp
, then this program will register the application in Consul, which will expose DNS as myapp.service.consul
. Consul also does the Mesos leader discovery through the leader.Mesos.service.consul
DNS, which points to the active leader.
How is this different from other service discovery software?
Mesos-dns is a project similar to Consul. In Mesos-dns, it polls Mesos to get information about the tasks, whereas with Consul, instead of exposing this information via a built-in DNS server, it populates the Consul Service discovery with this information. The services are then exposed by Consul through DNS and its REST endpoint.
The HAProxy-Marathon-bridge script is shipped with the Marathon installation. You can also use Marathon-lb for the same. Both of these create a configuration file for HAProxy and a lightweight TCP/HTTP proxy by looking up the running tasks from Marathon's REST API.
HAProxy-Marathon-bridge is a simple script providing a minimum set of functionalities and is easier to understand for novice users. The latter one, Marathon-lb, supports advanced features such as SSL offloading, load balancing based on the VHost, and sticky connections.
First, you need to create an HAProxy configuration from the running Marathon instance, which, by default, runs on port 8080
of the machine. You can use the HAProxy-Marathon-bridge script for this through the following syntax:
$ ./bin/haproxy-Marathon-bridge localhost:8080 > /etc/haproxy/haproxy.cfg
Note that here we specified localhost:8080
because we ran the Marathon instance and HAProxy...
Bamboo runs as a web daemon and automatically configures HAProxy for the web services deployed on Mesos and Marathon.
Bamboo comes with the following:
A Web UI to configure HAProxy Access Control Limit (ACL) rules for each of the Marathon applications
A REST endpoint to do the same
A preconfigured HAProxy configuration file based on your template, with which you can customize your own template to enable SSL and interface for HAProxy stats or configure strategies for load balancing
A Healthcheck endpoint if the Marathon application is configured with Healthchecks
A stateless daemon, which enables scalability and horizontal replication
No additional dependencies (as it is developed in Golang)
Integration with StatsD to monitor configuration reload events
Bamboo can be deployed on each of the Mesos slaves with HAProxy. As Bamboo is primarily used for web services deployed on Mesos, the service discovery is as simple as connecting to...
Netflix recently open sourced their scheduler library written in Java for Apache Mesos frameworks that supports scheduling optimizations and cluster autoscaling. At the time of writing the book, Fenzo is open sourced and is available in the official Netflix OSS suite repository, which can be found at the following URL: https://github.com/Netflix/Fenzo
There are basically two motivations for developing a framework such as Fenzo. Unlike other schedulers and frameworks discussed earlier, the reasons for building Fenzo are scheduling optimizations and autoscaling the cluster based on the usage.
When there is a huge variation in the amount of data that your cluster handles from time to time, provisioning the cluster for peak usage seems wasteful as most of the time, the resources will be idle. This is the main reason behind autoscaling the application depending on the load—that is, providing more machines to increase the cluster resources when there is peak usage...
The Yelp's Platform-as-a-service distributed system PaaSTA is highly available and used to build, deploy, and run services using containers such as Docker and Apache Mesos. PaaSTA is designed and developed by Yelp and has been recently open sourced. You can take a look at the open sourced repository at the following URL: https://github.com/yelp/paasta
This is a suite for developers to specify how they want their code from their Git repository to be built, deployed, routed, and monitored. Yelp has used PaaSTA for more than a year to power its production-level services. PaaSTA is best suited if you have a strict production environment, such as Yelp, which requires many tiny microservices and where rolling out a new piece of code should be seamless and not disturb production systems. PaaSTA helps automate this entire process.
This section will give you a brief comparison and use cases for the different scheduling frameworks that we discussed in this chapter.
Marathon is a PaaS built on Mesos to make sure the job will run forever even if few machines in the cluster go down. It can seamlessly handle the hardware and software failures and ensure the application is always running. These types of frameworks are useful in production environments where your application should always be running and available all the time—for example, a web server hosting a website. In such cases, you can deploy it as a Marathon application that will take care of all these aspects.
Chronos can be considered as a distributed fault-tolerant replacement of the typical Linux cron jobs that are used to fire up scheduled jobs, take periodic backups, check the health of the system, and so on. Both Chronos and Marathon come with a Web UI and a REST endpoint for the management...
In this chapter, we dived deep into some of the most important frameworks for Mesos that make job scheduling and load balancing much easier and efficient. Frameworks such as Marathon and Chronos and their REST endpoints along with some other tools, such as HAProxy, Consul, Marathoner, Bamboo, Fenzo, and PaaSTA, were explained.
In the next chapter, we'll discuss how system administrators and DevOps professionals can deploy a Mesos cluster using standard tools such as Ansible, Chef, Puppet, Salt, Terraform, and Cloud formation along with monitoring it using Nagios and Satellite.