Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
HBase Essentials

You're reading from  HBase Essentials

Product type Book
Published in Nov 2014
Publisher
ISBN-13 9781783987245
Pages 164 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Nishant Garg Nishant Garg
Profile icon Nishant Garg

Chapter 2. Defining the Schema

In this chapter, we are going to learn some of the basic concepts of the column family database, that is, HBase, and cover the following topics:

  • Data modeling

  • Designing tables

  • CRUD operations

Let's dive in and start off by taking a look at how we can model data in HBase.

Data modeling in HBase


In the RDBMS world, data modeling has principles around tables, columns, data types, size, and so on, and the only supported format is structured data. HBase is quite different in this aspect, as in each row, it can store different numbers of columns and data types, making it ideal for storing so-called semi-structured data. Storing semi-structured data not only impacts the physical schema but also the logical schema of HBase. For the same reason, some features such as relational constraints are also not present in HBase.

Similar to a typical RDBMS, tables are composed of rows and these rows are composed of columns. Rows in HBase are identified by a unique rowkey and are compared with each other at the byte level, which resembles a primary key in RDBMS.

In HBase, columns are organized into column families. There is no restriction on the number of columns that can be grouped together in a single column family. This column family is part of the data definition statement...

Designing tables


In HBase, when modeling the schema for any table, a designer should also keep in mind the following, among other things:

  • The number of column families and which data goes to which column family

  • The maximum number of columns in each column family

  • The type of data to be stored in the column

  • The number of historical values that need to be maintained for each column

  • The structure of a rowkey

Once we have answers, certain practices are followed to ensure optimal table design. Some of the design practices are as follows:

  • Data for a given column family goes into a single store on HDFS. This store might consist of multiple HFiles, which eventually get converted to a single HFile using compaction techniques.

  • Columns in a column family are also stored together on the disk, and the columns with different access patterns should be kept in different column families.

  • If we design tables with fewer columns and many rows (a tall table), we might achieve O(1) operations but also compromise with atomicity...

Accessing HBase


In the previous chapter, we saw how to create a table and simple data operations using the HBase shell. HBase can be accessed using a variety of clients, such as REST clients, Thrift client, object mapper framework—Kundera, and so on. HBase clients are discussed in detail in Chapter 6, HBase Clients. HBase also offers advanced Java-based APIs for playing with tables and column families. (HBase shell is a wrapper around this Java API.) This API also supports metadata management, for example, data compression for column family, region split, and so on. In addition to schema definition, the API also provides an interface for a table scan with various functions such as limiting the number of columns returned or limiting the number of versions of each cell to be stored. For data manipulation, the Hbase API supports create, read, update, and delete operations on individual rows. This API comes with many advanced features, which will be discussed throughout this book.

Note

In most...

Summary


In this chapter, we learned the basics of modeling data and some strategies to consider when designing a table in HBase. We also learned how to perform basic CRUD operations on the table created using various APIs provided by HBase. In the next chapter, we will look into HBase table keys, table scan, and some other advanced features such as filters.

lock icon The rest of the chapter is locked
You have been reading a chapter from
HBase Essentials
Published in: Nov 2014 Publisher: ISBN-13: 9781783987245
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}