Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
HBase Administration Cookbook

You're reading from  HBase Administration Cookbook

Product type Book
Published in Aug 2012
Publisher Packt
ISBN-13 9781849517140
Pages 332 pages
Edition 1st Edition
Languages
Author (1):
Yifeng Jiang Yifeng Jiang
Profile icon Yifeng Jiang

Table of Contents (16) Chapters

HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Setting Up HBase Cluster Data Migration Using Administration Tools Backing Up and Restoring HBase Data Monitoring and Diagnosis Maintenance and Security Troubleshooting Basic Performance Tuning Advanced Configurations and Tuning

Chapter 7. Troubleshooting

In this chapter, we will cover:

  • Troubleshooting tools

  • Handling the XceiverCount error

  • Handling the "too many open files" error

  • Handling the "unable to create new native thread" error

  • Handling the "HBase ignores HDFS client configuration" issue

  • Handling the ZooKeeper client connection error

  • Handling the ZooKeeper session expired error

  • Handling the HBase startup error on EC2

Introduction


Everyone expects their HBase cluster to run smoothly and steadily, but the cluster does not work as expected sometimes, especially when a cluster has not been well configured. This chapter describes things you can do to troubleshoot a cluster that is running in an unexpected status.

Before you start troubleshooting a cluster, it is better to get familiar with the tools that will help us restore the cluster. Useful tools are as important as a deep knowledge of HBase and the cluster you are operating. We will introduce several recommended tools and their sample usage, in the first recipe.

Problems usually occur on a cluster that is missing the basic setup. If you encounter problems with your cluster, the first thing you should do is analyze the master log file, as the master acts as the coordinator service of the cluster. Hopefully, you will be able to identify the root cause of the error once you have found the WARN or ERROR level logs in the log file. The region server log...

Troubleshooting tools


In order to troubleshoot an HBase cluster, besides a solid knowledge of the cluster you are operating, the tools you use are also important. We would like to recommend the following troubleshooting tools:

  • ps: This can be used to find the top process that used the most memory and CPU

  • ClusterSSH tool: This tool is used to control multiple SSH sessions simultaneously

  • jps: This tool shows the Java processes for the current user

  • jmap: This tool prints the Java heap summary

  • jstat: This is the Java Virtual Machine statistics monitoring tool

  • hbase hbck: This tool is used for checking and repairing region consistency and table integrity

  • hadoop fsck: This tool is used for checking HDFS consistency

We will describe sample usage of these tools, in this recipe.

Getting ready

Start your HBase cluster.

How to do it...

The following are the troubleshooting tools we will describe:

  • ps: This tool is used to find the top processes that occupied large amounts of memory. The following...

Handling the XceiverCount error


In this recipe, we will describe how to troubleshoot the following XceiverCount error shown in the DataNode logs:

2012-02-18 17:08:10,695 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.166.111.191:50010, storageID=DS-2072496811-10.168.130.82-50010-1321345166369, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 256
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run (DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:662)

Getting ready

Log in to your master node.

How to do it...

The following are the steps to fix the XceiverCount error:

  1. 1. Add the following snippet to the HDFS setting file (hdfs-site.xml):

    hadoop@master1$ vi $HADOOP_HOME/conf/hdfs-site.xml
    <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
    </property>
    
    
  2. 2. Sync the hdfs-site.xml file across the cluster:

    hadoop@master1$ for slave...

Handling the "too many open files" error


In this recipe, we will describe how to troubleshoot the error shown in the following DataNode logs:

2012-02-18 17:43:18,009 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.166.111.191:50010, storageID=DS-2072496811-10.168.130.82-50010-1321345166369, infoPort=50075, ipcPort=50020):DataXceiver
java.io.FileNotFoundException: /usr/local/hadoop/var/dfs/data/current/subdir6/blk_-8839555124496884481 (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1068)

Getting ready

To fix this issue, you will need root privileges on every node of the cluster. We assume you use the hadoop user to start your HDFS cluster.

How to do it...

To fix the "too many open files" error, execute the following steps on every node of the cluster:

  1. 1. Increase the open file number of...

Handling the "unable to create new native thread" error


In this recipe, we will describe how to troubleshoot the error shown in the following RegionServer logs:

2012-02-18 18:46:04,907 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2830)

Getting ready

To fix this error, you will need root privileges on every node of the cluster. We assume you use the hadoop user to start your HDFS and HBase clusters.

How to do it...

To fix the "unable to create new native thread" error, execute the following steps on every node of the cluster:

  1. 1. Increase the maximum number of processes of the hadoop user by adding the following properties to the /etc/security/limits.conf file:

    $ vi /etc/security/limits.conf
    hadoop soft nproc 32000
    hadoop hard nproc 32000
    
    
  2. 2. Add the...

Handling the "HBase ignores HDFS client configuration" issue


You may have noticed that HBase ignores your HDFS client configuration, for example the dfs.replication setting. In the following example, we have set a replication factor of 2 for our HDFS client:

$ grep -A 1 "dfs.replication" $HADOOP_HOME/conf/hdfs-site.xml
<name>dfs.replication</name>
<value>2</value>

However, the HBase files on HDFS show a factor of 3, which is the default replication factor of HDFS:

This is not what we expected—the replication factor was expected to be 2, but the actual value is 3.

We will describe why this happens and how to fix it, in this recipe.

Getting ready

Log in to your master node as the user who starts HDFS and HBase. We assume you are using the hadoop user for HDFS and HBase.

How to do it...

The following are the steps to apply your HDFS client configurations to HBase:

  1. 1. Add a symbolic link of the HDFS setting file (hdfs-site.xml) under the HBase configuration directory:

    $ hadoop...

Handling the ZooKeeper client connection error


In this recipe, we will describe how to troubleshoot the ZooKeeper client connection error shown in the following RegionServer logs:

2012-02-19 15:17:06,199 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server ip-10-168-47-220.us-west-1.compute.internal /10.168.47.220:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:166) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:66)

Getting ready

Log in to your ZooKeeper quorum nodes.

How to do it...

The following are the steps to fix the ZooKeeper client connection error:

  1. 1. Add the following to the ZooKeeper configuration file (zoo...

Handling the ZooKeeper session expired error


In this recipe, we will describe how to troubleshoot the following ZooKeeper session expired error shown in the RegionServer logs:

2012-02-19 16:49:15,405 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/ip-10-168-37-91.us-west-1.compute.internal,60020,1329635463251 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)

This issue is critical, because master or region servers will shut themselves down if they lose connection to the ZooKeeper quorum.

Getting ready

Log in to the server where this error occurred.

How to do it...

The following are the steps...

Handling the HBase startup error on EC2


In this recipe, we will describe how to troubleshoot the HBase startup error shown in the following Master logs:

2011-12-10 14:04:57,422 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of ip-10-166-219-206.us-west-1.compute.internal
2011-12-10 14:04:57,423 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: hostname can't be null
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:121)
at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:64)

This error usually happens after stopping and restarting EC2 instances. The reason is that HBase stores region locations in its "system" -ROOT- and .META. tables. The location information contains the internal EC2 DNS name in it. Stopping an EC2 instance will change this DNS name. Due to the...

lock icon The rest of the chapter is locked
You have been reading a chapter from
HBase Administration Cookbook
Published in: Aug 2012 Publisher: Packt ISBN-13: 9781849517140
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}