Reader small image

You're reading from  HBase Administration Cookbook

Product typeBook
Published inAug 2012
PublisherPackt
ISBN-139781849517140
Edition1st Edition
Right arrow
Author (1)
Yifeng Jiang
Yifeng Jiang
author image
Yifeng Jiang

Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakutenthe largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
Read more about Yifeng Jiang

Right arrow

Chapter 5. Monitoring and Diagnosis

In this chapter, we will focus on:

  • Showing the disk utilization of HBase tables

  • Setting up Ganglia to monitor an HBase cluster

  • OpenTSDB—using HBase to monitor an HBase cluster

  • Setting up Nagios to monitor HBase processes

  • Using Nagios to check Hadoop/HBase logs

  • Simple scripts to report the status of the cluster

  • Hot region—write diagnosis

Introduction


It is vital to monitor the status of an HBase cluster to ensure that it is operating as expected. The challenge of monitoring a distributed system, besides taking the case of each server separately, is that you will also need to look at the overall status of the cluster.

HBase inherits its monitoring APIs from Hadoop's metrics framework. It exposes a large amount of metrics, giving the insight information of the cluster. These metrics are subsequently configured to expose other monitoring systems, such as Ganglia or OpenTSDB, to gather and make them visible through graphs. Ganglia/OpenTSDB graphs help us understand the insight of the cluster, both for a single server and the entire cluster.

Graphs are good for getting an overview of the historical status, but we also need a mechanism to check the current state of the cluster, and send us notifications or take some automatic actions if the cluster has some problem. A good solution for this kind of monitoring task is Nagios. Nagios...

Showing the disk utilization of HBase tables


In this recipe, we will show the answer to the following simple question:

How much space is HBase or a single HBase table using on HDFS?

It is a really simple task, but you might need to answer this question frequently. We will give you a tip to make it a bit easier.

Getting ready

Start your HBase cluster and log in to your HBase client node. We assume your HBase root directory on HDFS is /hbase.

How to do it...

The instructions to show the disk utilization of HBase tables are as follows:

  1. 1. Show the disk utilization for all HBase objects by executing the following command:

    $ $HADOOP_HOME/bin/hadoop fs -dus /hbase
    hdfs://master1:8020/hbase 1016842660
    
  2. 2. Show the disk utilization of a particular HBase table (hly_temp) by executing the following command:

    $ $HADOOP_HOME/bin/hadoop fs -dus /hbase/hly_temp
    hdfs://master1:8020/hbase/hly_temp 54738763
    
  3. 3. Show a list of the regions of an HBase table and their disk utilization, by executing the following command...

Setting up Ganglia to monitor an HBase cluster


One of the most important parts of HBase operation tasks is to monitor the cluster and make sure it is running as expected. HBase inherits its monitoring APIs from Hadoop. It exposes a lot of metrics, which gives the insight information of the cluster's current status, including region-based statistics, RPC details, and the Java Virtual Machine (JVM) memory and garbage collection data.

These metrics are then subsequently configured to expose to JMX and Ganglia, which makes the metrics visible through graphs. Ganglia is the recommended tool for monitoring large-scale clusters. Ganglia itself is a scalable, distributed system; it is said to be able to handle clusters with 2000 nodes.

We will describe how to use Ganglia to monitor an HBase cluster in this recipe. We will install Ganglia Monitoring Daemon (Gmond) on each node in the cluster, which will gather the server and HBase metrics of that node. These metrics are then subsequently polled...

OpenTSDB—using HBase to monitor an HBase cluster


OpenTSDB is an extremely scalable Time Series Database (TSDB) built on top of HBase. Like Ganglia, OpenTSDB can be used to monitor various systems including HBase. As compared to Ganglia, which stores its data in RRDtool, OpenTSDB leverages HBase's scalability to monitor it at a larger scale. The following is an introduction from the OpenTSDB homepage (http://opentsdb.net/):

Thanks to HBase's scalability, OpenTSDB allows you to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points.

To use OpenTSDB, we need to write little scripts to collect data from our systems, and push them into OpenTSDB every few seconds. Tcollector is a framework for collecting metrics from Linux, MySQL, Hadoop, HBase, and so on for OpenTSDB. It is interesting that OpenTSDB uses HBase (to store metrics) to monitor HBase...

Setting up Nagios to monitor HBase processes


Monitoring HBase-related processes in the cluster is an important part of operating HBase. A basic monitoring is done by running health checks on the HBase processes and notifying the administrators if any process is down.

Nagios is a popular, open source monitoring software used to watch hosts, services, and resources, and alert users when something goes wrong and when it gets recovered again. Nagios can be easily extended by custom-modules, which are called plugins. The check_tcp plugin is shipped with the Nagios installation. We can use this plugin to send a ping to a Hadoop/HBase daemon's RPC port, to check whether the daemon is alive.

In this recipe, we will set up a monitor server running Nagios to watch all the HBase-related processes in the entire cluster. We will configure Nagios to send us e-mail notifications if any Hadoop/HBase/ZooKeeper process is down.

Getting ready

You will need a monitor server to run Nagios on. We assume that...

Using Nagios to check Hadoop/HBase logs


Hadoop, ZooKeeper, and HBase, all produce logs. These logs include information about normal operations, as well as warning/error output, and internal diagnostic data. It is ideal to have a system gathering and processing all these logs to extract useful insight information of the cluster. A most basic task is to check these logs and get notified if anything abnormal is shown in them. The NRPE and check_log Nagios plugins can be used to achieve this simple goal, with a few simple steps.

The description from NRPE plugin's homepage (http://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details) is as follows:

NRPE allows you to remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.).

Using NRPE, we can remotely execute the check_log Nagios plugin on a cluster node to check the Hadoop/HBase logs generated by that node.

The check_log...

Simple scripts to report the status of the cluster


Besides the health of the HBase-related daemons and their logs, what you might want to monitor is the overview of the current status of the cluster. This status basically includes:

  • The HBase hbck result showing whether the HBase tables are consistent

  • The Hadoop fsck result showing whether HDFS is healthy

  • The remaining HDFS space

In this recipe, we will create a check_hbase Nagios plugin to perform the task of monitoring. We will install our check_hbase plugin on the master node of the cluster, and remotely execute it with Nagios from the monitor server using the NRPE Nagios plugin.

Getting ready

We assume that you have installed and configured the Nagios NRPE plugin on your monitor and master server. If you have not installed it yet, refer to the previous recipe for detailed installation instructions.

How to do it...

The following are instructions to get the status of the HBase cluster and monitor it by Nagios:

  1. 1. Create a check_hbase script...

Hot region—write diagnosis


As the data keeps growing, the HBase cluster may become unbalanced due to poorly designed table schema or row keys, or for some other reasons. Many requests may go to a small part of the regions of a table. This is usually called the hot spot region issue.

There are two types of hot spot region issues—hot write and hot read issues. Hot write is generally more important for us, because hot read would benefit greatly from the HBase internal cache mechanism. A solution for the hot write region issue is to find out the hot regions, split them manually, and then distribute the split regions to other region servers.

An HBase edit will firstly be written to the region server's Write-ahead-Log (WAL) . The actual update to the table data occurs once the WAL is successfully appended. This architecture makes it possible to get an approximate write diagnosis easily.

We will create a WriteDiagnosis.java Java source to get write diagnostic information from WAL, in this...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
HBase Administration Cookbook
Published in: Aug 2012Publisher: PacktISBN-13: 9781849517140
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Yifeng Jiang

Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakutenthe largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
Read more about Yifeng Jiang