Setting up multiple High Availability (HA) masters
Hadoop and HBase are designed to handle the failover of their slave nodes automatically. Because there may be many nodes in a large cluster, a hardware failure of a server or shut down of a slave node are considered as normal in the cluster.
For the master nodes, HBase itself has no SPOF. HBase uses ZooKeeper as its central coordination service. A ZooKeeper ensemble is typically clustered with three or more servers; as long as more than half of the servers in the cluster are online, ZooKeeper can provide its service normally.
HBase saves its active master node, root region server location, and other important running data in ZooKeeper. Therefore, we can just start two or more HMaster
daemons on separate servers and the one started first will be the active master server of the HBase cluster.
But, NameNode of HDFS is the SPOF of the cluster. NameNode keeps the entire HDFS's filesystem image in its local memory. HDFS cannot function anymore if NameNode is down, as HBase is down too. As you may notice, there is a Secondary NameNode of HDFS. Note that Secondary NameNode is not a standby of NameNode, it just provides a checkpoint function to NameNode. So, the challenge of a highly available cluster is to make NameNode highly available.
In this recipe, we will describe the setup of two highly available master nodes, which will use Heartbeat to monitor each other. Heartbeat is a widely used HA solution to provide communication and membership for a Linux cluster. Heartbeat needs to be combined with a Cluster Resource Manager (CRM) to start/stop services for that cluster. Pacemaker is the preferred cluster resource manager for Heartbeat. We will set up a Virtual IP (VIP) address using Heartbeat and Pacemaker, and then associate it with the active master node. Because EC2 does not support static IP addresses, we cannot demonstrate it on EC2, but we will discuss an alternative way of using Elastic IP (EIP) to achieve our purpose.
We will focus on setting up NameNode and HBase; you can simply use a similar method to set up two JobTracker nodes as well.
You should already have HDFS and HBase installed. We will set up a standby master node (master2
), as you need another server ready to use. Make sure all the dependencies have been configured properly. Sync your Hadoop and HBase root directory from the active master (master1
) to the standby master.
We will need NFS in this recipe as well. Set up your NFS server, and mount the same NFS directory from both master1
and master2
. Make sure the hadoop
user has write permission to the NFS directory. Create a directory on NFS to store Hadoop's metadata. We assume the directory is /mnt/nfs/hadoop/dfs/name
.
We will set up VIP for the two masters, and assume you have the following IP addresses and DNS mapping:
master1:
This has its IP address as 10.174.14.11.
master2:
This has its IP address as 10.174.14.12.
master:
This has its IP address as 10.174.14.10. It is the VIP that will be set up later.
The following instructions describe how to set up two highly available master nodes.
Install and configure Heartbeat and Pacemaker
First, we will install Heartbeat and Pacemaker, and make some basic configurations:
1. Install Heartbeat and Pacemaker on master1
and master2:
2. To configure Heartbeat, make the following changes to both master1
and master2:
3. Create an authkeys
file. Execute the following script as a root
user on master1
and master2:
Create and install a NameNode resource agent
Pacemaker depends on a resource agent to manager the cluster. A resource agent is an executable that manages a cluster resource. In our case, the VIP address and the HDFS NameNode service is the cluster resource we want to manage, using Pacemaker. Pacemaker ships with an IPaddr
resource agent to manage VIP, so we only need to create our own namenode
resource agent:
1. Add environment variables to the .bashrc
file of the root
user on master1
and master2
. Don't forget to apply the changes:
Invoke the following command to apply the previous changes:
2. Create a standard Open Clustering Framework (OCF) resource agent file called namenode
, with the following content.
The namenode
resource agent starts with including standard OCF functions such as the following:
3. Add a meta_data()
function as shown in the following code. The meta_data()
function dumps the resource agent metadata to standard output. Every resource agent must have a set of XML metadata describing its own purpose and supported parameters:
4. Add a namenode_start()
function. This function is used by Pacemaker to actually start the NameNode daemon on the server. In the namenode_start()
function, we firstly check whether NameNode is already started on the server; if it is not started, we invoke hadoop-daemon.sh
from the hadoop
user to start it:
5. Add a namenode_stop()
function. This function is used by Pacemaker to actually stop the NameNode daemon on the server. In the namenode_stop()
function, we first check whether NameNode is already stopped on the server; if it is running, we invoke hadoop-daemon.sh
from the hadoop
user to stop it:
6. Add a namenode_status()
function. This function is used by Pacemaker to monitor the status of the NameNode daemon on the server. In the namenode_status()
function, we use the jps
command to show all running Java processes owned by the hadoop
user, and the grep
name of the NameNode daemon to see whether it has started:
7. Add a namenode_validateAll()
function to make sure the environment variables are set properly before we run other functions:
8. Add the following main routine. Here, we will simply call the previous functions to implement the required standard OCF resource agent actions:
9. Change the namenode
file permission and test it on master1
and master2:
10. Make sure all the tests are passed before proceeding to the next step, or the HA cluster will behave unexpectedly.
11. Install the namenode
resource agent under the hac
provider on master1
and master2:
Configure highly available NameNode
We are ready to configure highly available NameNode using Heartbeat and Pacemaker. We will set up a VIP address and configure Hadoop and HBase to use this VIP address as their master node. NameNode will be started on the active master where VIP is assigned. If active master has crashed, Heartbeat and Pacemaker will detect it and assign the VIP address to the standby master node, and then start NameNode there.
1. Start Heartbeat on master1
and master2:
2. Change the default crm
configuration. All resource-related commands are only executed once, from master1
or master2:
3. Add a VIP resource using our VIP address:
4. Make the following changes to configure Hadoop to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:
5. Make the following changes to configure HBase to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:
6. To configure Hadoop to write its metadata to a local disk and NFS, make the following changes and sync to all masters, clients, and slaves:
7. Add the namenode
resource agent we created in step 5 to Pacemaker. We will use NAMENODE
as its resource name:
8. Configure the VIP
resource and the NAMENODE
resource as a resource group:
9. Configure colocation
of a VIP resource and the NAMENODE
resource:
10. Configure the resource order of the VIP resource and the NAMENODE
resource:
11. Verify the previous Heartbeat and resource configurations by using the crm_mon
command. If everything is configured properly, you should see an output like the following :
12. Make sure that the VIP
and NAMENODE
resources are started on the same server.
13. Now stop Heartbeat from master1; VIP-AND-NAMENODE
should be started at master2
after several seconds.
14. Restart Heartbeat from master1; VIP-AND-NAMENODE
should remain started at master2
. Resources should NOT failback to master1
.
Start DataNode, HBase cluster, and backup HBase master
We have confirmed that our HA configuration works as expected, so we can start HDFS and HBase now. Note that NameNode has already been started by Pacemaker, so we need only start DataNode here:
1. If everything works well, we can start DataNode now:
2. Start your HBase cluster from master
, which is the active master server where the VIP address is associated:
3. Start standby HMaster from the standby master server, master2
in this case:
The previous steps finally leave us with a cluster structure like the following diagram:
At first, we installed Heartbeat and Pacemaker on the two masters and then configured Heartbeat to enable Pacemaker.
In step 2 of the Create and install a NameNode resource agent section, we created the namenode
script, which is implemented as a standard OCF resource agent. The most important function of the namenode
script is namenode_status
, which monitors the status of the NameNode daemon. Here we use the jps
command to show all running Java processes owned by the hadoop
user, and the grep
name of the NameNode daemon to see if it has started. The namenode
resource agent is used by Pacemaker to start/stop/monitor the NameNode daemon. In the namenode
script, as you can see in the namenode_start
and namenode_stop
methods, we actually start/stop NameNode by using hadoop-daemon.sh
, which is used to start/stop the Hadoop
daemon on a single server. You can find a full list of the code from the source shipped with this book.
We started Heartbeat after our namenode
resource agent was tested and installed. Then, we made some changes to the default crm
configurations. The default-resource-stickiness=1
setting is very important as it turns off
the automatic failback of a resource.
We added a VIP
resource to Pacemaker and configured Hadoop and HBase to use it in steps 3 to 5 of the Configure highly available NameNode section. By using VIP in their configuration, Hadoop and HBase can switch to communicate with the standby master if the active one is down.
In step 6 of the same section, we configured Hadoop (HDFS NameNode) to write its metadata to both the local disk and NFS. If an active master is down, NameNode will be started from the standby master. Because they were mounted on the same NFS directory, NameNode started from the standby master can apply the latest metadata from NFS, and restore HDFS to the status before the original active master is down.
In steps 7 to 10, we added the NAMENODE
resource using the namenode
resource agent we created in step 2 of the Create and install a NameNode resource agent section, then we set up VIP
and NAMENODE
resources as a group (step 8), and made sure they always run on the same server (step 9), with the right start-up order (step 10). We did this because we didn't want VIP running on master1
, while NameNode was running on master2
.
Because Pacemaker will start NameNode for us via the namenode
resource agent, we need to start DataNode separately, which is what we did in step 1 of the Start DataNode, HBase cluster, and backup HBase master section.
After starting HBase normally, we started our standby HBase master (HMaster) on the standby master server. If you check your HBase master log, you will find output like the following, which shows itself as a standby HMaster:
Finally, we got NameNode and HMaster running on two servers with an active-standby configuration. The single point of failure of the cluster was avoided.
However, it leaves us with lots of works to do in production. You need to test your HA cluster in all rare cases, such as a server power off, unplug of a network cable, shut down of network switch, or anything else you can think of.
On the other hand, SPOF of the cluster may not be as critical as you think. Based on our experience, almost all of the downtime of the cluster is due to an operational miss or software upgrade. It's better to make your cluster simple.
It is more complex to set up a highly available HBase cluster on Amazon EC2 because EC2 does not support static IP addresses, and so we can't use VIP on EC2. An alternative way is to use an Elastic IP address. An Elastic IP address is the role of a static IP address on EC2 while it is associated with your account, not a particular instance. We can use Heartbeat to associate EIP to the standby master automatically, if the active one is down. Then, we configure Hadoop and HBase to use an instance's public DNS associated with EIP, to find an active master. Also, in the namenode
resource agent, we have to start/stop not only NameNode, but also all DataNodes. This is because the IP address of an active master has changed, but DataNode cannot find the new active master unless it is restarted.
We will skip the details because it's out of the scope of this book. We created an elastic-ip
resource agent to achieve this purpose. You can find it in the source shipped with this book.