Reader small image

You're reading from  HBase Administration Cookbook

Product typeBook
Published inAug 2012
PublisherPackt
ISBN-139781849517140
Edition1st Edition
Right arrow
Author (1)
Yifeng Jiang
Yifeng Jiang
author image
Yifeng Jiang

Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakutenthe largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
Read more about Yifeng Jiang

Right arrow

Basic Hadoop/ZooKeeper/HBase configurations


There are some basic settings we should tune, before moving forward. These are very basic and important Hadoop (HDFS), ZooKeeper, and HBase settings that you should consider to change immediately after setting up your cluster.

Some of these settings take effect due to data durability or cluster availability, which must be configured, while some are recommended configurations for running HBase smoothly.

Configuration settings depend on your hardware, data, and cluster size. We will describe a guideline in this recipe. You may need to change the settings to fit your environment.

Every time you make changes, you need to sync to all clients and slave nodes, then restart the respective daemon to apply the changes.

How to do it...

The configurations that should be considered for change are as follows:

  1. 1. Turn on dfs.support.append for HDFS. The dfs.support.append property determines whether HDFS should support the append (sync) feature or not. The default value is false. It must be set to true, or you may lose data if the region server crashes:

    hadoop$ vi $HADOOP_HOME/conf/hdfs-site.xml
    
    <property>
    <name>dfs.support.append</name>
    <value>true</value>
    </property>
    
  2. 2. Increase the dfs.datanode.max.xcievers value to have DataNode keep more threads open, to handle more concurrent requests:

    hadoop$ vi $HADOOP_HOME/conf/hdfs-site.xml
    <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
    </property>
    
  3. 3. Increase ZooKeeper's heap memory size so that it does not swap:

    hadoop$ vi $ZK_HOME/conf/java.env
    export JAVA_OPTS="-Xms1000m -Xmx1000m"
    
  4. 4. Increase ZooKeeper's maximum client connection number to handle more concurrent requests:

    hadoop$ echo "maxClientCnxns=60" >> $ZK_HOME/conf/zoo.cfg
    
  5. 5. Increase HBase's heap memory size to run HBase smoothly:

    hadoop$ vi $HBASE_HOME/conf/hbase-env.sh
    
    export HBASE_HEAPSIZE=8000
    
  6. 6. Decrease the zookeeper.session.timeout value so that HBase can find the crashed region server fast, and recover it in a short time:

    hadoop$ vi $HBASE_HOME/conf/hbase-site.xml
    <property>
    <name>zookeeper.session.timeout</name>
    <value>60000</value>
    </property>
    
  7. 7. To change Hadoop/ZooKeeper/HBase log settings, edit the log4j.properties file and the hadoop-env.sh/hbase-env.sh file under the conf directory of the Hadoop/ZooKeeper/HBase installation. It's better to change the log directory out of the installation folder. For example, the following specifies HBase to generate its logs under the /usr/local/hbase/logs directory:

    hadoop$ vi $HBASE_HOME/conf/hbase-env.sh
    export HBASE_LOG_DIR=/usr/local/hbase/logs
    

How it works...

In step 1, by turning on dfs.support.append, the HDFS flush is enabled. With this feature enabled, a writer of HDFS can guarantee that data will be persisted by invoking a flush call. So, HBase can guarantee that when a region server dies, data can be recovered and replayed on other region servers using its Write-Ahead Log (WAL) .

To verify if the HDFS append is supported or not, see your HMaster log of the HBase startup. If the append is not turned to on, you will find a log like the following:

$ grep -i "HDFS-200" hbase-hadoop-master-master1.log
...syncFs -- HDFS-200 -- not available, dfs.support.append=false

For step 2, we configured the dfs.datanode.max.xcievers setting, which specifies the upper bound on the number of files HDFS DataNode will serve at any one time.

Note

Note that the name is xcievers—it's a misspelled name. Its default value is 256, which is too low for running HBase on HDFS.

Steps 3 and 4 are about ZooKeeper settings. ZooKeeper is very sensitive to swapping, which will seriously degrade its performance. ZooKeeper's heap size is set in the java.env file. ZooKeeper has an upper bound on the number of connections it will serve at any one time. Its default is 10, which is too low for HBase, especially when running MapReduce on it. We would suggest setting it to 60.

In step 5, we configured HBase's heap memory size. HBase ships with a heap size of 1 GB, which is too low for modern machines. A reasonable value for large machines is 8 GB or larger, but under 16 GB.

In step 6, we changed the ZooKeeper's session timeout to a lower value. Lower timeout means HBase can find crashed region servers faster, and thus, recover the crashed regions on other servers in a short time. On the other hand, with a very short session timeout, there is a risk that the HRegionServer daemon may kill itself when the cluster is in heavy load, because it may not be able to send a heartbeat to the ZooKeeper before getting a timeout.

See also

  • Chapter 8, Basic Performance Tuning

  • Chapter 9, Advanced Confi gurations and Performance Tuning

Previous PageNext Page
You have been reading a chapter from
HBase Administration Cookbook
Published in: Aug 2012Publisher: PacktISBN-13: 9781849517140
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Yifeng Jiang

Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakutenthe largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
Read more about Yifeng Jiang