Free Sample
+ Collection

HBase Design Patterns

Mark Kerzner, Sujee Maniyam

Design and implement successful patterns to develop scalable applications with HBase
RRP $19.99
RRP $32.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783981045
Paperback150 pages

About This Book

  • Design HBase schemas for the most demanding functional and scalability requirements
  • Optimize HBase's handling of single entities, time series, large files, and complex events by utilizing design patterns
  • Written in an easy-to-follow style, and incorporating plenty of examples, and numerous hints and tips.

Who This Book Is For

If you are an intermediate NoSQL developer or have a few big data projects under your belt, you will learn how to increase your chances of a successful and useful NoSQL application by mastering the design patterns described in the book. The HBase design patterns apply equally well to Cassandra, MongoDB, and so on.

Table of Contents

Chapter 1: Starting Out with HBase
Installing HBase
Selecting an instance
Adding storage
Security groups
Starting the instance
Chapter 2: Reading, Writing, and Using SQL
Inspecting the cluster
HBase tables, families, and cells
The HBase shell
Project Phoenix — a SQL for HBase
Chapter 3: Using HBase Tables for Single Entities
Storing user information
Sets, maps, and lists
Generating the test data
Analyzing your query
Chapter 4: Dealing with Large Files
Storing files using keys
Using UUID
What to do when your binary files grow larger
Chapter 5: Time Series Data
Using time-based keys to store time series data
Avoiding region hotspotting
Tall and narrow rows versus wide rows
OpenTSDB principles
Chapter 6: Denormalization Use Cases
Storing all the objects for a user
Dealing with lost usernames and passwords
Tables for storing videos
A popularity contest
The section tag index
Chapter 7: Advanced Patterns for Data Modeling
Many-to-many relationships in HBase
Applying the many-to-many relationship techniques for a video site
Event time data – keeping track of what is going on
Dealing with transactions
Trafodion – transactional SQL on HBase
Chapter 8: Performance Optimization
Loading bulk data into HBase
Importing data into HBase using MapReduce
Importing data from HDFS into HBase
Profiling HBase applications
Benchmarking or load testing HBase
Monitoring HBase

What You Will Learn

  • Install and configure a Hadoop cluster and HBase
  • Write Java code to read and write HBase
  • Explore Phoenix open source project to talk to HBase in SQL
  • Store single entities, generate keys, use lists, maps, and sets
  • Utilize UUID for generic key generation to store data and deal with large files
  • Use denormalization to optimize performance
  • Represent one-to-many and many-to-many relationships and deal with transactions
  • Troubleshoot and optimize your application

In Detail

With the increasing use of NoSQL in general and HBase in particular, knowing how to build practical applications depends on the application of design patterns. These patterns, distilled from extensive practical experience of multiple demanding projects, guarantee the correctness and scalability of the HBase application. They are also generally applicable to most NoSQL databases.

Starting with the basics, this book will show you how to install HBase in different node settings. You will then be introduced to key generation and management and the storage of large files in HBase. Moving on, this book will delve into the principles of using time-based data in HBase, and show you some cases on denormalization of data while working with HBase. Finally, you will learn how to translate the familiar SQL design practices into the NoSQL world. With this concise guide, you will get a better idea of typical storage patterns, application design templates, HBase explorer in multiple scenarios with minimum effort, and reading data from multiple region servers.


Read More