Free Sample
+ Collection

HBase Administration Cookbook

Yifeng Jiang

Master HBase configuration and administration for optimum database performance with this book and ebook
RRP $29.99
RRP $49.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781849517140
Paperback332 pages

About This Book

  • Move large amounts of data into HBase and learn how to manage it efficiently
  • Set up HBase on the cloud, get it ready for production, and run it smoothly with high performance
  • Maximize the ability of HBase with the Hadoop eco-system including HDFS, MapReduce, Zookeeper, and Hive

Who This Book Is For

This book is for HBase administrators, developers, and will even help Hadoop administrators. You are not required to have HBase experience, but are expected to have a basic understanding of Hadoop and MapReduce.

Table of Contents

Chapter 1: Setting Up HBase Cluster
Quick start
Getting ready on Amazon EC2
Setting up Hadoop
Setting up ZooKeeper
Changing the kernel settings
Setting up HBase
Basic Hadoop/ZooKeeper/HBase configurations
Setting up multiple High Availability (HA) masters
Chapter 2: Data Migration
Importing data from MySQL via single client
Importing data from TSV files using the bulk load tool
Writing your own MapReduce job to import data
Precreating regions before moving data into HBase
Chapter 3: Using Administration Tools
HBase Master web UI
Using HBase Shell to manage tables
Using HBase Shell to access data in HBase
Using HBase Shell to manage the cluster
Executing Java methods from HBase Shell
Row counter
WAL tool—manually splitting and dumping WALs
HFile tool—viewing textualized HFile content
HBase hbck—checking the consistency of an HBase cluster
Hive on HBase—querying HBase using a SQL-like language
Chapter 4: Backing Up and Restoring HBase Data
Full shutdown backup using distcp
Using CopyTable to copy data from one table to another
Exporting an HBase table to dump files on HDFS
Restoring HBase data by importing dump files from HDFS
Backing up NameNode metadata
Backing up region starting keys
Cluster replication
Chapter 5: Monitoring and Diagnosis
Showing the disk utilization of HBase tables
Setting up Ganglia to monitor an HBase cluster
OpenTSDB—using HBase to monitor an HBase cluster
Setting up Nagios to monitor HBase processes
Using Nagios to check Hadoop/HBase logs
Simple scripts to report the status of the cluster
Hot region—write diagnosis
Chapter 6: Maintenance and Security
Enabling HBase RPC DEBUG-level logging
Graceful node decommissioning
Adding nodes to the cluster
Rolling restart
Simple script for managing HBase processes
Simple script for making deployment easier
Kerberos authentication for Hadoop and HBase
Configuring HDFS security with Kerberos
HBase security configuration
Chapter 7: Troubleshooting
Troubleshooting tools
Handling the XceiverCount error
Handling the "too many open files" error
Handling the "unable to create new native thread" error
Handling the "HBase ignores HDFS client configuration" issue
Handling the ZooKeeper client connection error
Handling the ZooKeeper session expired error
Handling the HBase startup error on EC2
Chapter 8: Basic Performance Tuning
Setting up Hadoop to spread disk I/O
Using network topology script to make Hadoop rack-aware
Mounting disks with noatime and nodiratime
Setting vm.swappiness to 0 to avoid swap
Java GC and HBase heap settings
Using compression
Managing compactions
Managing a region split
Chapter 9: Advanced Configurations and Tuning
Benchmarking HBase cluster with YCSB
Increasing region server handler count
Precreating regions using your own algorithm
Avoiding update blocking on write-heavy clusters
Tuning memory size for MemStores
Client-side tuning for low latency systems
Configuring block cache for column families
Client side scanner setting
Tuning block size to improve seek performance
Enabling Bloom Filter to improve the overall throughput

What You Will Learn

  • Set up a fully distributed, highly available HBase cluster and load data into it using the normal client API or your own MapReduce job
  • Access data in HBase via HBase Shell or Hive using its SQL-like query language
  • Backup and restore HBase table, along with its data distribution, and move or replicate data between different HBase clusters
  • Gather metrics then show them in graphs, monitor the cluster's status, and get notified if thresholds are exceeded
  • Tune your kernel settings with JVM GC, Hadoop, and HBase configuration to maximize the performance
  • Discover troubleshooting tools and tips in order to avoid the most commonly-found problems with HBase
  • Gain optimum performance with data compression, region splits, and by manually managing compaction
  • Learn advanced configuration and tuning for read and write-heavy clusters

In Detail

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.

HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.

The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.


Read More

Recommended for You