Packt+ | Advance your knowledge in tech

You're reading from HBase Administration Cookbook

Product type Book

Published in Aug 2012

Publisher Packt

ISBN-13 9781849517140

Pages 332 pages

Edition 1st Edition

Languages

Concepts

Database Administration

Author (1):

Yifeng Jiang

Table of Contents (16) Chapters

HBase Administration Cookbook

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Setting Up HBase Cluster

Data Migration

Using Administration Tools

Backing Up and Restoring HBase Data

Monitoring and Diagnosis

Maintenance and Security

Troubleshooting

Basic Performance Tuning

Advanced Configurations and Tuning

Chapter 7. Troubleshooting

In this chapter, we will cover:

Troubleshooting tools
Handling the XceiverCount error
Handling the "too many open files" error
Handling the "unable to create new native thread" error
Handling the "HBase ignores HDFS client configuration" issue
Handling the ZooKeeper client connection error
Handling the ZooKeeper session expired error
Handling the HBase startup error on EC2

Introduction

Everyone expects their HBase cluster to run smoothly and steadily, but the cluster does not work as expected sometimes, especially when a cluster has not been well configured. This chapter describes things you can do to troubleshoot a cluster that is running in an unexpected status.

Before you start troubleshooting a cluster, it is better to get familiar with the tools that will help us restore the cluster. Useful tools are as important as a deep knowledge of HBase and the cluster you are operating. We will introduce several recommended tools and their sample usage, in the first recipe.

Problems usually occur on a cluster that is missing the basic setup. If you encounter problems with your cluster, the first thing you should do is analyze the master log file, as the master acts as the coordinator service of the cluster. Hopefully, you will be able to identify the root cause of the error once you have found the WARN or ERROR level logs in the log file. The region server log...

Troubleshooting tools

In order to troubleshoot an HBase cluster, besides a solid knowledge of the cluster you are operating, the tools you use are also important. We would like to recommend the following troubleshooting tools:

ps: This can be used to find the top process that used the most memory and CPU
ClusterSSH tool: This tool is used to control multiple SSH sessions simultaneously
jps: This tool shows the Java processes for the current user
jmap: This tool prints the Java heap summary
jstat: This is the Java Virtual Machine statistics monitoring tool
hbase hbck: This tool is used for checking and repairing region consistency and table integrity
hadoop fsck: This tool is used for checking HDFS consistency

We will describe sample usage of these tools, in this recipe.

Getting ready

Start your HBase cluster.

How to do it...

The following are the troubleshooting tools we will describe:

ps: This tool is used to find the top processes that occupied large amounts of memory. The following...

Handling the XceiverCount error

In this recipe, we will describe how to troubleshoot the following XceiverCount error shown in the DataNode logs:

2012-02-18 17:08:10,695 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.166.111.191:50010, storageID=DS-2072496811-10.168.130.82-50010-1321345166369, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 256
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run (DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:662)

Getting ready

How to do it...

The following are the steps to fix the XceiverCount error:

1. Add the following snippet to the HDFS setting file (hdfs-site.xml):

hadoop@master1$ vi $HADOOP_HOME/conf/hdfs-site.xml
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>

2. Sync the hdfs-site.xml file across the cluster:
```
hadoop@master1$ for slave...
```

Handling the "too many open files" error

In this recipe, we will describe how to troubleshoot the error shown in the following DataNode logs:

2012-02-18 17:43:18,009 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.166.111.191:50010, storageID=DS-2072496811-10.168.130.82-50010-1321345166369, infoPort=50075, ipcPort=50020):DataXceiver
java.io.FileNotFoundException: /usr/local/hadoop/var/dfs/data/current/subdir6/blk_-8839555124496884481 (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1068)

Getting ready

To fix this issue, you will need root privileges on every node of the cluster. We assume you use the hadoop user to start your HDFS cluster.

How to do it...

To fix the "too many open files" error, execute the following steps on every node of the cluster:

1. Increase the open file number of...

Handling the "unable to create new native thread" error

In this recipe, we will describe how to troubleshoot the error shown in the following RegionServer logs:

2012-02-18 18:46:04,907 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2830)

Getting ready

To fix this error, you will need root privileges on every node of the cluster. We assume you use the hadoop user to start your HDFS and HBase clusters.

How to do it...

To fix the "unable to create new native thread" error, execute the following steps on every node of the cluster:

1. Increase the maximum number of processes of the hadoop user by adding the following properties to the /etc/security/limits.conf file:
```
$ vi /etc/security/limits.conf
hadoop soft nproc 32000
hadoop hard nproc 32000
```
2. Add the...

Handling the "HBase ignores HDFS client configuration" issue

You may have noticed that HBase ignores your HDFS client configuration, for example the dfs.replication setting. In the following example, we have set a replication factor of 2 for our HDFS client:

$ grep -A 1 "dfs.replication" $HADOOP_HOME/conf/hdfs-site.xml
<name>dfs.replication</name>
<value>2</value>

However, the HBase files on HDFS show a factor of 3, which is the default replication factor of HDFS:

This is not what we expected—the replication factor was expected to be 2, but the actual value is 3.

We will describe why this happens and how to fix it, in this recipe.

Getting ready

Log in to your master node as the user who starts HDFS and HBase. We assume you are using the hadoop user for HDFS and HBase.

How to do it...

The following are the steps to apply your HDFS client configurations to HBase:

1. Add a symbolic link of the HDFS setting file (hdfs-site.xml) under the HBase configuration directory:
```
$ hadoop...
```

Handling the ZooKeeper client connection error

In this recipe, we will describe how to troubleshoot the ZooKeeper client connection error shown in the following RegionServer logs:

2012-02-19 15:17:06,199 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server ip-10-168-47-220.us-west-1.compute.internal /10.168.47.220:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:166) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:66)

Getting ready

How to do it...

The following are the steps to fix the ZooKeeper client connection error:

1. Add the following to the ZooKeeper configuration file (zoo...

Handling the ZooKeeper session expired error

In this recipe, we will describe how to troubleshoot the following ZooKeeper session expired error shown in the RegionServer logs:

2012-02-19 16:49:15,405 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/ip-10-168-37-91.us-west-1.compute.internal,60020,1329635463251 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)

This issue is critical, because master or region servers will shut themselves down if they lose connection to the ZooKeeper quorum.

Getting ready

How to do it...

The following are the steps...

Handling the HBase startup error on EC2

In this recipe, we will describe how to troubleshoot the HBase startup error shown in the following Master logs:

2011-12-10 14:04:57,422 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of ip-10-166-219-206.us-west-1.compute.internal
2011-12-10 14:04:57,423 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: hostname can't be null
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:121)
at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:64)

This error usually happens after stopping and restarting EC2 instances. The reason is that HBase stores region locations in its "system" -ROOT- and .META. tables. The location information contains the internal EC2 DNS name in it. Stopping an EC2 instance will change this DNS name. Due to the...

The rest of the chapter is locked

You're reading from HBase Administration Cookbook

Table of Contents (16) Chapters

Chapter 7. Troubleshooting

Introduction

Troubleshooting tools

Getting ready

How to do it...

Handling the XceiverCount error

Getting ready

How to do it...

Handling the "too many open files" error

Getting ready

How to do it...

Handling the "unable to create new native thread" error

Getting ready

How to do it...

Handling the "HBase ignores HDFS client configuration" issue

Getting ready

How to do it...

Handling the ZooKeeper client connection error

Getting ready

How to do it...

Handling the ZooKeeper session expired error

Getting ready

How to do it...

Handling the HBase startup error on EC2

Authors (1)

Personalised recommendations for you

You're reading from HBase Administration Cookbook

Table of Contents (16) Chapters

Chapter 7. Troubleshooting

Introduction

Troubleshooting tools

Getting ready

How to do it...

Handling the XceiverCount error

Getting ready

How to do it...

Handling the "too many open files" error

Getting ready

How to do it...

Handling the "unable to create new native thread" error

Getting ready

How to do it...

Handling the "HBase ignores HDFS client configuration" issue

Getting ready

How to do it...

Handling the ZooKeeper client connection error

Getting ready

How to do it...

Handling the ZooKeeper session expired error

Getting ready

How to do it...

Handling the HBase startup error on EC2

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you