Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
HBase Administration Cookbook

You're reading from  HBase Administration Cookbook

Product type Book
Published in Aug 2012
Publisher Packt
ISBN-13 9781849517140
Pages 332 pages
Edition 1st Edition
Languages
Author (1):
Yifeng Jiang Yifeng Jiang
Profile icon Yifeng Jiang

Table of Contents (16) Chapters

HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Setting Up HBase Cluster Data Migration Using Administration Tools Backing Up and Restoring HBase Data Monitoring and Diagnosis Maintenance and Security Troubleshooting Basic Performance Tuning Advanced Configurations and Tuning

Chapter 3. Using Administration Tools

In this chapter, we will focus on:

  • HBase Master Web UI

  • Using HBase Shell to manage tables

  • Using HBase Shell to access data in HBase

  • Using HBase Shell to manage the cluster

  • Executing Java methods from HBase Shell

  • Row counter

  • WAL tool—manually splitting and dumping WALs

  • HFile tool—viewing textualized HFile content

  • HBase hbck—checking the health of an HBase cluster

  • Hive on HBase—querying HBase using a SQL-like language

Introduction


Everyone expects their HBase administrator to keep the cluster running smoothly, storing a huge amount of data in it, handling maybe millions of requests simultaneously, quickly, and reliably. Keeping a large amount of data in HBase accessible, manageable, and easy to query, is a critical task for an administrator.

Besides a solid knowledge of the cluster you are operating, just as important are the tools you use. HBase ships with several administration tools to make life easier. There is a web-based administration page; on this page you can view the status of the cluster and execute simple administration tasks such as region splitting. However, more powerful than HBase web UI is the HBase Shell tool. This command-line tool has features to create and manage HBase tables, to insert and view data in the tables, and also has methods to manage the cluster itself.

HBase also provides a bunch of Java utilities with its installation. You can import and use these utilities directly from...

HBase Master web UI


The HBase Master web UI is a simple but useful tool, to get an overview of the current status of the cluster. From its page, you can get the version of the running HBase, its basic configuration, including the root HDFS path and ZooKeeper quorum, load average of the cluster, and a table, region, and region server list.

Furthermore, you can manually split a region using a particular boundary row key. This is useful when you turn off the automatic region splitting of your cluster.

Getting ready

Make sure the port for your master page, which has a default value of 60010, is opened to your client computer from your network firewall. If you are running your cluster on Amazon EC2, you can open the port from AWS Management Console | Amazon EC2 | NETWORK & SECURITY | Security Groups | Inbound.

How to do it...

Access the following URL from your web browser:

http://hbase_master_server:60010/master.jsp

Note

You need to change hbase_master_server to the hostname of your HBase master...

Using HBase Shell to manage tables


HBase Shell is a command-line tool shipped with HBase. It provides basic functions to manage tables, access data in HBase, and manage the cluster. HBase Shell has several groups of commands. The group for managing tables is called Data Definition Language (DDL). Using DDL group commands, you can create, drop, and change HBase tables. You can also disable/enable tables from HBase Shell.

Getting ready

Start your HBase cluster.

How to do it...

The following steps will show you how to use DDL commands to manage HBase tables:

  1. 1. Execute the following command from the client node, to start an HBase Shell prompt:

    hac@client1$ $HBASE_HOME/bin/hbase shell
    
  2. 2. Create a table(t1) with a single column family(f1) from HBase Shell, using the create command:

    hbase> create 't1', 'f1'
    
  3. 3. Show the table list by using the list command:

    hbase> list
    TABLE
    hly_temp
    t1
    
  4. 4. Show the table properties using the describe command:

    hbase> describe 't1'
    DESCRIPTION ENABLED...

Using HBase Shell to access data in HBase


HBase Shell provides Data Manipulation Language (DML) group commands to manipulate data in HBase. The DML group includes the commands count, delete, deleteall, get, get_counter, incr, put, scan, and truncate. Just as their names express, these commands provide basic access and update operations on data in HBase.

Note

HBase has a feature called counter, which is useful to build a metrics gathering system on HBase. Get_counter and incr are commands for counter operations.

The count, scan, and truncate commands may take time to finish when running them on a huge amount of data in HBase.

To count a big table, you should use the rowcounter MapReduce job, which is shipped with HBase. We will describe it in the Row counter recipe, later in this chapter.

Getting ready

Start your HBase cluster, connect to the cluster from your client, and create a table called t1, if it does not exist.

How to do it...

The following steps are demonstrations of how to use DML commands...

Using HBase Shell to manage the cluster


There are a bunch of HBase Shell commands for managing the cluster. These commands belong to the tool's group.

Note

Warning

Many of these commands are for advanced users only, as their misuse can cause unexpected damages to an HBase installation.

The tool's group commands provide an interface to manage HBase regions manually. Their features include:

  • Region deployment

  • Region splitting

  • Cluster balancing

  • Region flushing and compaction

Although HBase does all these operations automatically, by default, there are situations wherein you may want to balance your region server's load manually. This is especially true when the default, balancing algorithm does not work well for your data access pattern.

In this recipe, we will describe how to manually flush, compact, split, balance, and move HBase regions.

Getting ready

Start your HBase cluster, create a table, and put some data into it. We will use the hly_temp table we created in Chapter 2, Data Migration, for...

Executing Java methods from HBase Shell


HBase Shell is written in JRuby. As JRuby runs within Java Virtual Machine (JVM) , it is very easy to execute Java methods from HBase Shell. HBase ships with many Java utility classes; the ability to execute Java methods from HBase Shell makes it possible to import and use these utilities directly from HBase Shell.

We will demonstrate two examples of how to call Java method from HBase Shell, in this recipe. The first one converts the timestamp of the HBase Shell output into a readable date format. The second one imports an HBase filter class, and performs the filtering on the scanner of the scan command.

Getting ready

Start your HBase cluster, create a table, and put some data into it. We will use the hly_temp table we created in Chapter 2, for demonstration purposes.

Connect to your cluster via HBase Shell, before you start.

How to do it...

To convert the timestamp of an HBase Shell output into a readable date format:

  1. 1. Enter the following command...

Row counter


The count command in HBase Shell is a straightforward way to count the row numbers on an HBase table. However, running the count command on a table with a huge amount of data might take a long time to complete. A better approach for this case is to use the RowCounter class. This class will kick a MapReduce job to count the row number on a table, which is much more efficient than the count command.

We will describe the usage of RowCounter in this recipe.

Getting ready

Make sure your Hadoop and HBase clusters are running. MapReduce is also required; if it is not running, start it by using the following command on your JobTracker server:

hadoop@master1$ $HADOOP_HOME/bin/start-mapred.sh

Log in to your HBase client node.

How to do it...

To run a row counter MapReduce job on the hly_temp table, follow these steps:

  1. 1. Add a ZooKeeper JAR file to the Hadoop class path on your client node:

    hadoop@client1$ vi $HADOOP_HOME/conf/hadoop-env.sh
    HBASE_HOME=/usr/local/hbase/current
    export HADOOP_CLASSPATH...

WAL tool—manually splitting and dumping WALs


An HBase edit will first be written to a region server's Write Ahead Log (WAL). After the log is written successfully, MemStore of the region server will be updated. As WAL is a sequence file on HDFS, it will be automatically replicated to the two other DataNode servers by default, so that a single region server crash will not cause a loss of the data stored on it.

As WAL is shared by all regions deployed on the region server, the WAL needs to first be split so that it can be replayed on each relative region, in order to recover from a region server crash. HBase handles region server failover automatically by using this algorithm.

HBase has a WAL tool providing manual WAL splitting and dumping facilities. We will describe how to use this tool in this recipe.

Getting ready

We will need to put some data into an HBase table to have HBase generate WAL files for our demonstration. Again, we will use the hly_temp table in this recipe. We will put the...

HFile tool—viewing textualized HFile content


HFile is the internal file format for HBase to store its data. These are the first two lines of the description of HFile from its source code:

File format for hbase.

A file of sorted key/value pairs. Both keys and values are byte arrays.

We don't need to know the details of HFile for our administration tasks. However, by using the HFile tool, we can get some useful information from HFile.

The HFile tool provides the facility to view a textualized version of HFile content.

We can also get the metadata of an HFile file by using this tool. Some metadata, such as entry count and average Key/Value size, are important indicators of performance tuning.

We will describe how to use an HFile tool to show textualized content and metadata of HFile files.

Getting ready

Log in to your HBase client node.

Pick a region name or HFile file path to be viewed. A region name can be found in the Table Regions section of your HBase web UI. HFile files are stored under...

HBase hbck—checking the consistency of an HBase cluster


HBase provides the hbck command to check for various inconsistencies. The name hbck comes from the HDFS fsck command, which is the tool to check HDFS for inconsistencies. The following is a very easy-to-understand description from the source of hbck:

Check consistency among the in-memory states of the master and the region server(s) and the state of data in HDFS.

HBase hbck not only has the facility to check inconsistencies, but also the functionality to fix an inconsistency.

In production, we recommend you run hbck frequently so that inconsistencies can be found earlier and fixed easily.

In this recipe, we will describe how to use hbck to check inconsistencies. We will also make some inconsistencies to the cluster and then demonstrate how to use hbck to fix it.

Getting ready

Start up your HBase cluster, and log in to your HBase client node.

How to do it...

The instructions to check and fix the inconsistencies of an HBase cluster using...

Hive on HBase—querying HBase using a SQL-like language


HBase supports several interfaces to access data in its tables, such as the following:

  • HBase Shell

  • Java Client API

  • REST, Thrift, and Avro

HBase Shell is straightforward, but a little too simple to perform complex queries on. Other interfaces need programming, which is not suitable for ad hoc queries.

As data keeps growing, people might want an easy way to analyze the large amount of data stored in HBase. The analysis should be efficient, ad hoc, and it should not require programming. Hive is currently the best approach for this purpose.

Hive is a data warehouse infrastructure built for Hadoop. Hive is used for ad hoc querying, and analyzing a large data set without having to write a MapReduce program. Hive supports a SQL-like query language called HiveQL (HQL)  to access data in its table.

We can integrate HBase and Hive, so that we can use HQL statements to access HBase tables, both to read and write.

In this recipe, we will describe how...

lock icon The rest of the chapter is locked
You have been reading a chapter from
HBase Administration Cookbook
Published in: Aug 2012 Publisher: Packt ISBN-13: 9781849517140
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}