HBase Administration Cookbook

HBase Administration Cookbook
eBook: $29.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $79.98    Print cover: $49.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • Move large amounts of data into HBase and learn how to manage it efficiently
  • Set up HBase on the cloud, get it ready for production, and run it smoothly with high performance
  • Maximize the ability of HBase with the Hadoop eco-system including HDFS, MapReduce, Zookeeper, and Hive

Book Details

Language : English
Paperback : 332 pages [ 235mm x 191mm ]
Release Date : August 2012
ISBN : 1849517142
ISBN 13 : 9781849517140
Author(s) : Yifeng Jiang
Topics and Technologies : All Books, Big Data and Business Intelligence, Cookbooks, Open Source, Web Development

Table of Contents

Chapter 1: Setting Up HBase Cluster
Chapter 2: Data Migration
Chapter 3: Using Administration Tools
Chapter 4: Backing Up and Restoring HBase Data
Chapter 5: Monitoring and Diagnosis
Chapter 6: Maintenance and Security
Chapter 7: Troubleshooting
Chapter 8: Basic Performance Tuning
Chapter 9: Advanced Configurations and Tuning
  • Chapter 1: Setting Up HBase Cluster
    • Introduction
    • Quick start
    • Getting ready on Amazon EC2
    • Setting up Hadoop
    • Setting up ZooKeeper
    • Changing the kernel settings
    • Setting up HBase
    • Basic Hadoop/ZooKeeper/HBase configurations
    • Setting up multiple High Availability (HA) masters
    • Chapter 2: Data Migration
      • Introduction
      • Importing data from MySQL via single client
      • Importing data from TSV files using the bulk load tool
      • Writing your own MapReduce job to import data
      • Precreating regions before moving data into HBase
      • Chapter 3: Using Administration Tools
        • Introduction
        • HBase Master web UI
        • Using HBase Shell to manage tables
        • Using HBase Shell to access data in HBase
        • Using HBase Shell to manage the cluster
        • Executing Java methods from HBase Shell
        • Row counter
        • WAL tool—manually splitting and dumping WALs
        • HFile tool—viewing textualized HFile content
        • HBase hbck—checking the consistency of an HBase cluster
        • Hive on HBase—querying HBase using a SQL-like language
        • Chapter 4: Backing Up and Restoring HBase Data
          • Introduction
          • Full shutdown backup using distcp
          • Using CopyTable to copy data from one table to another
          • Exporting an HBase table to dump files on HDFS
          • Restoring HBase data by importing dump files from HDFS
          • Backing up NameNode metadata
          • Backing up region starting keys
          • Cluster replication
          • Chapter 5: Monitoring and Diagnosis
            • Introduction
            • Showing the disk utilization of HBase tables
            • Setting up Ganglia to monitor an HBase cluster
            • OpenTSDB—using HBase to monitor an HBase cluster
            • Setting up Nagios to monitor HBase processes
            • Using Nagios to check Hadoop/HBase logs
            • Simple scripts to report the status of the cluster
            • Hot region—write diagnosis
            • Chapter 6: Maintenance and Security
              • Introduction
              • Enabling HBase RPC DEBUG-level logging
              • Graceful node decommissioning
              • Adding nodes to the cluster
              • Rolling restart
              • Simple script for managing HBase processes
              • Simple script for making deployment easier
              • Kerberos authentication for Hadoop and HBase
              • Configuring HDFS security with Kerberos
              • HBase security configuration
              • Chapter 7: Troubleshooting
                • Introduction
                • Troubleshooting tools
                • Handling the XceiverCount error
                • Handling the "too many open files" error
                • Handling the "unable to create new native thread" error
                • Handling the "HBase ignores HDFS client configuration" issue
                • Handling the ZooKeeper client connection error
                • Handling the ZooKeeper session expired error
                • Handling the HBase startup error on EC2
                • Chapter 8: Basic Performance Tuning
                  • Introduction
                  • Setting up Hadoop to spread disk I/O
                  • Using network topology script to make Hadoop rack-aware
                  • Mounting disks with noatime and nodiratime
                  • Setting vm.swappiness to 0 to avoid swap
                  • Java GC and HBase heap settings
                  • Using compression
                  • Managing compactions
                  • Managing a region split
                  • Chapter 9: Advanced Configurations and Tuning
                    • Introduction
                    • Benchmarking HBase cluster with YCSB
                    • Increasing region server handler count
                    • Precreating regions using your own algorithm
                    • Avoiding update blocking on write-heavy clusters
                    • Tuning memory size for MemStores
                    • Client-side tuning for low latency systems
                    • Configuring block cache for column families
                    • Increasing block cache size on read-heavy clusters
                    • Client side scanner setting
                    • Tuning block size to improve seek performance
                    • Enabling Bloom Filter to improve the overall throughput

                    Yifeng Jiang

                    Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakuten—the largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
                    Sorry, we don't have any reviews for this title yet.

                    Code Downloads

                    Download the code and support files for this book.

                    Submit Errata

                    Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                    Sample chapters

                    You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                    Frequently bought together

                    HBase Administration Cookbook +    Middleware Management with Oracle Enterprise Manager Grid Control 10g R5 =
                    50% Off
                    the second eBook
                    Price for both: £27.35

                    Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                    What you will learn from this book

                    • Set up a fully distributed, highly available HBase cluster and load data into it using the normal client API or your own MapReduce job
                    • Access data in HBase via HBase Shell or Hive using its SQL-like query language
                    • Backup and restore HBase table, along with its data distribution, and move or replicate data between different HBase clusters
                    • Gather metrics then show them in graphs, monitor the cluster's status, and get notified if thresholds are exceeded
                    • Tune your kernel settings with JVM GC, Hadoop, and HBase configuration to maximize the performance
                    • Discover troubleshooting tools and tips in order to avoid the most commonly-found problems with HBase
                    • Gain optimum performance with data compression, region splits, and by manually managing compaction
                    • Learn advanced configuration and tuning for read and write-heavy clusters

                    In Detail

                    As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.

                    HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.

                    The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.


                    As part of Packt’s cookbook series, each recipe offers a practical, step-by-step solution to common problems found in HBase administration.

                    Who this book is for

                    This book is for HBase administrators, developers, and will even help Hadoop administrators. You are not required to have HBase experience, but are expected to have a basic understanding of Hadoop and MapReduce.

                    Code Download and Errata
                    Packt Anytime, Anywhere
                    Register Books
                    Print Upgrades
                    eBook Downloads
                    Video Support
                    Contact Us
                    Awards Voting Nominations Previous Winners
                    Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                    Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software