Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
HBase Administration Cookbook

You're reading from  HBase Administration Cookbook

Product type Book
Published in Aug 2012
Publisher Packt
ISBN-13 9781849517140
Pages 332 pages
Edition 1st Edition
Languages
Author (1):
Yifeng Jiang Yifeng Jiang
Profile icon Yifeng Jiang

Table of Contents (16) Chapters

HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Setting Up HBase Cluster Data Migration Using Administration Tools Backing Up and Restoring HBase Data Monitoring and Diagnosis Maintenance and Security Troubleshooting Basic Performance Tuning Advanced Configurations and Tuning

Chapter 6. Maintenance and Security

In this chapter, we will focus on:

  • Enabling HBase RPC DEBUG-level logging

  • Graceful node decommissioning

  • Adding nodes to the cluster

  • Rolling restart

  • Simple script for managing HBase processes

  • Simple script for making deployment easier

  • Kerberos authentication for Hadoop and HBase

  • Configuring HDFS security with Kerberos

  • HBase security configuration

Introduction


After a cluster is delivered for operation, maintenance will be a necessary ongoing task while the cluster is in use. Typical maintenance tasks include finding out and correcting faults, changing cluster size, making configuration changes, and so on.

One of the most important HBase features is that it is extremely easy to scale in and out. As your service and data keeps growing, you might need to add nodes to the cluster.

Graceful node decommissioning and rolling restart will also become necessary. Minimizing the offline time during the decommission and restart is an important task. What is important is to keep the data distribution the same as what it was before the restart, to retain data locality.

Another maintenance task is to manage HBase deployment. There are many ways to deploy your HBase to the cluster. The simplest way is to use a script-based approach to sync HBase installations and configurations across the cluster.

We will cover these topics in the first six recipes...

Enabling HBase RPC DEBUG-level logging


Hadoop and HBase use the log4j library to write their logs. Logging level is set in the log4j.properties file. In production, the logging level is usually set to the INFO level, which is good for many situations. However, there will be cases where you might want to see the debug information of a particular Hadoop/HBase daemon.

HBase inherits its online logging level change capability from Hadoop. It is possible to change an HBase daemon's logging level from its web UI without restarting the daemon.

This feature is useful when you need to know the debug information of an HBase daemon but cannot restart it. A typical situation is to troubleshoot a production HBase cluster.

We will describe how to enable HBase RPC DEBUG-level logging in this recipe.

Getting ready

Start the HBase cluster and open the HBase web UI from the following URL:

http://<master_host>:60010/master.jsp

How to do it...

The instructions to enable HBase RPC DEBUG-level logging without...

Graceful node decommissioning


We will describe how to stop a region server gracefully in this recipe.

It is possible to simply invoke the following command to stop the RegionServer daemon on a region server:

hadoop@slave1$ $HBASE_HOME/bin/hbase-daemon.sh stop regionserver

However, this approach has the disadvantage that regions deployed on the stopped region server will go offline for a while during the stopping process. In production, especially for clusters handling online requests, it is expected to gracefully stop a region server to minimize the region's offline time.

We will describe how HBase supports its graceful node decommissioning feature in this recipe.

Getting ready

Start your HBase cluster and log into the master node as the user (the hadoop user in our demonstration) who starts the cluster.

How to do it...

The instructions to gracefully decommission a region server are as follows:

  1. 1. Gracefully stop a region server by invoking the following command:

    hadoop@master1$ $HBASE_HOME/bin...

Adding nodes to the cluster


One of the most important HBase features is that it is extremely scalable. HBase lineally scales out by adding nodes to the cluster. It is easy to start at a small cluster and scale it out when your service and data grows. Adding a region server to an HBase cluster would be an important maintenance task for administrators.

An HBase cluster can only have one active master node. However, we can add a backup master node to the cluster to make the HBase master highly available (HA) .

In this recipe, we will describe how to add a backup master node to an HBase cluster. We will also describe adding region servers to a cluster after that.

Getting ready

Download and install HBase on the new master or region server first. Make sure the HBase configuration on that node is synced with other nodes in the cluster.

A region server usually runs on the same DataNode/TaskTracker of Hadoop. You might want to install Hadoop and start DataNode and TaskTracker on that node too.

We assume...

Rolling restart


You might want to invoke a rolling restart when upgrading to a new HBase version, or when you want to apply some configuration changes. As described in the Graceful node decommissioning recipe, a rolling restart minimizes downtime because we only take a single region offline at a time rather than a whole cluster. A rolling restart keeps the region distribution the same as what it was before the restart. This is important to retain data locality.

Note

New HBase versions are not always backward compatible. You can invoke a rolling restart to upgrade minor releases (for example, from 0.92.1 to 0.92.2), but not across major versions (for example, from 0.92.x to 0.94.x) because the protocol has changed between these versions. This will change in HBase 0.96 when you will be able to have old clients talk to new servers and vice versa.

Please check the following link for details about upgrading from one version to another:

http://hbase.apache.org/book.html#upgrading

A rolling restart...

Simple script for managing HBase processes


When the nodes in the cluster keep growing, you might want to find tools to show and manage the HBase-related processes running in the cluster. As the hadoop user is configured to be able to SSH from the master node to each slave node in the cluster without a password, it is easy for us to write a simple script to achieve this task with SSH login for every node and show/manage the running HBase processes on that node.

As Hadoop/HBase processes run in a Java Virtual Machine (JVM) , our task is to manage these Java processes in the cluster.

In this recipe, we will create a simple script to show all the running Java processes owned by the hadoop user in an HBase cluster.

Getting ready

Start your HBase cluster. Log in to the master node as the user who started the cluster.

We assume that you are running HDFS and HBase as the same user (the hadoop user here).

How to do it...

The instructions to create a simple script to manage HBase processes are as follows...

Simple script for making deployment easier


There are many ways to deploy your HBase to the cluster. As Hadoop and HBase are written in Java, most of the deployment is done by simply copying all the files to the nodes in the cluster.

The simplest way is to use a script-based approach to sync HBase installation and configurations across the cluster. It may not be as cool compared to other modern deployment management tools, but it works well for small or even medium-sized clusters.

In this recipe, we will create a simple script to sync an HBase installation from its master node to all region servers in the cluster. This approach can be used to deploy Hadoop as well.

Getting ready

Log in to the master node as the user who starts the cluster. We assume that you have set up a non-password SSH from the master node to the region servers, for the user.

How to do it...

The instructions to create a simple script to make HBase deployment easier are as follows:

  1. 1. Create a cluster-deploy.sh script, shown...

Kerberos authentication for Hadoop and HBase


Security support has been added to the recently released Hadoop 1.0 and HBase 0.92. With security enabled, only authenticated users can access a Hadoop and HBase cluster. The authentication is provided by a separate authentication service managed by trusted administrators. This makes HBase a considerable option to store sensitive, big data such as financial data.

Hadoop relies on the Kerberos authentication service for its security support. A secure HBase must run on HDFS with security support, so HBase also relies on Kerberos to provide it with security support.

The following is the description of Kerberos on Wikipedia:

Kerberos is a computer network authentication protocol which works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner.

The most widely used Kerberos implementation is MIT Kerberos. We will describe how to install and set up MIT Kerberos in...

Configuring HDFS security with Kerberos


Newer releases of Hadoop (0.20.203 and above) support an optional Kerberos authentication of clients. This security support includes secure HDFS and secure MapReduce configurations.

The motivation for Hadoop security is not to defend against hackers, as all large Hadoop clusters are behind firewalls that only allow employees to access them. Its purpose is simply to allow storing sensitive data such as financial data on a shared cluster.

Prior releases of Hadoop already had file ownership and permissions in HDFS; the limitation was that they had no mechanisms for verifying user identity. With this Kerberos security support, user identities are verified by Kerberos, and only authenticated users are allowed to access the HDFS cluster.

As a secure HBase access is expected to be running on top of a secured HDFS cluster, setting up HDFS security is a prerequisite for HBase security configuration. In this recipe, we will focus on how to configure HDFS security...

HBase security configuration


As HBase becomes more and more popular, different users and groups may store more data in a shared HBase cluster. You might not like all users having full permission to every HBase table. This adds risks to your data, for example, security risks or missed data operation.

Newer HBase releases (0.92 and above) have Kerberos-based security support. With this, user identities are verified by Kerberos, and only authenticated users are allowed to access data in a secured HBase cluster.

We will describe how to configure secure client access to HBase in this recipe.

Getting ready

Make sure you are using the security-enabled HBase release. If you are downloading from the official HBase site, the filename should look like hbase-0.92.1-security.tar.gz.

We assume that you have a working Kerberos Key Distribution Center (KDC)  and have realm set up. For more information about installing and configuring Kerberos, see the Kerberos authentication for Hadoop and HBase recipe...

lock icon The rest of the chapter is locked
You have been reading a chapter from
HBase Administration Cookbook
Published in: Aug 2012 Publisher: Packt ISBN-13: 9781849517140
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}