Packt+ | Advance your knowledge in tech

You're reading from HBase Essentials

Product type Book

Published in Nov 2014

Publisher

ISBN-13 9781783987245

Pages 164 pages

Edition 1st Edition

Languages

Java

Concepts

Databases

Author (1):

Nishant Garg

Table of Contents (14) Chapters

HBase Essentials

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Introducing HBase

Defining the Schema

Advanced Data Modeling

The HBase Architecture

The HBase Advanced API

HBase Clients

HBase Administration

Index

Chapter 2. Defining the Schema

In this chapter, we are going to learn some of the basic concepts of the column family database, that is, HBase, and cover the following topics:

Data modeling
Designing tables
CRUD operations

Let's dive in and start off by taking a look at how we can model data in HBase.

Data modeling in HBase

In the RDBMS world, data modeling has principles around tables, columns, data types, size, and so on, and the only supported format is structured data. HBase is quite different in this aspect, as in each row, it can store different numbers of columns and data types, making it ideal for storing so-called semi-structured data. Storing semi-structured data not only impacts the physical schema but also the logical schema of HBase. For the same reason, some features such as relational constraints are also not present in HBase.

Similar to a typical RDBMS, tables are composed of rows and these rows are composed of columns. Rows in HBase are identified by a unique rowkey and are compared with each other at the byte level, which resembles a primary key in RDBMS.

In HBase, columns are organized into column families. There is no restriction on the number of columns that can be grouped together in a single column family. This column family is part of the data definition statement...

Designing tables

In HBase, when modeling the schema for any table, a designer should also keep in mind the following, among other things:

The number of column families and which data goes to which column family
The maximum number of columns in each column family
The type of data to be stored in the column
The number of historical values that need to be maintained for each column
The structure of a rowkey

Once we have answers, certain practices are followed to ensure optimal table design. Some of the design practices are as follows:

Data for a given column family goes into a single store on HDFS. This store might consist of multiple HFiles, which eventually get converted to a single HFile using compaction techniques.
Columns in a column family are also stored together on the disk, and the columns with different access patterns should be kept in different column families.
If we design tables with fewer columns and many rows (a tall table), we might achieve O(1) operations but also compromise with atomicity...

Accessing HBase

In the previous chapter, we saw how to create a table and simple data operations using the HBase shell. HBase can be accessed using a variety of clients, such as REST clients, Thrift client, object mapper framework—Kundera, and so on. HBase clients are discussed in detail in Chapter 6, HBase Clients. HBase also offers advanced Java-based APIs for playing with tables and column families. (HBase shell is a wrapper around this Java API.) This API also supports metadata management, for example, data compression for column family, region split, and so on. In addition to schema definition, the API also provides an interface for a table scan with various functions such as limiting the number of columns returned or limiting the number of versions of each cell to be stored. For data manipulation, the Hbase API supports create, read, update, and delete operations on individual rows. This API comes with many advanced features, which will be discussed throughout this book.

Note

In most...

Summary

In this chapter, we learned the basics of modeling data and some strategies to consider when designing a table in HBase. We also learned how to perform basic CRUD operations on the table created using various APIs provided by HBase. In the next chapter, we will look into HBase table keys, table scan, and some other advanced features such as filters.