Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
HBase Essentials

You're reading from  HBase Essentials

Product type Book
Published in Nov 2014
Publisher
ISBN-13 9781783987245
Pages 164 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Nishant Garg Nishant Garg
Profile icon Nishant Garg

Chapter 3. Advanced Data Modeling

So far, we have learned the basic building blocks of HBase schema designing and the CRUD operations over the designed schema. In this chapter, we are going to dive deep and learn the advanced level concepts of HBase, covering the following topics:

  • Understanding keys

  • HBase table scans

  • Implementing filters

Let's get an insight into the listed advanced concepts of HBase.

Understanding keys


In HBase, we primarily have the following keys to handle data within the tables:

  • Row Key: This provides a logical representation of an entire row, containing all the column families and column qualifiers

  • Column Key: This is formed by combining the column family and the column qualifier

Logically, the data stored in cells is arranged in a tabular format, but physically, these tabular rows are stored as linear sets of the actual cells. These linear sets of cells contain all the real data inside them.

Additionally, the data within multiple versions of the same cell is also stored as a separate linear set of cells and a timestamp is added, along with the cell data stored. These linear sets of cells are sorted in descending order by their timestamp so that the HBase client always fetches the most recent value of the cell data.

The following diagram represents how data is stored physically on the disk:

In HBase, the entire cell, along with the added structural information such as...

HBase table scans


In the previous chapter, we took a look at CRUD operations in HBase. Now, let's take a step further and discuss table scans in Hbase. In Hbase, table scans are similar to iterators in Java or nonscrollable cursors in the RDBMS world. The HBase table scans command is useful for querying the data to access the complete set of records for a specific value by applying filters. Hence, the scan() operation reads the defined portion of data similar to the get() operation, and the filters are applied to the read portion for narrowing down the results further.

The org.apache.hadoop.hbase.client package provides the Scan class with the following constructors:

Implementing filters


We apply column families, column qualifiers, timestamps, or ranges with the get() and scan() operations for limiting the data retrieved. Designing a row key to match the access patterns in every case is not possible, as at times we only need a subset of the data retrieved. Filters provide such a level of fine-grained access, that is, filtering the dataset based on some regular expression. The HBase API provides a filter interface and an abstract class, FilterBase, under the org.apache.hadoop.hbase.filter package, which is further extended by many classes such as CompareFilter, PageFilter, SkipFilter, TimeStampsFilter, and so on, to provide additional functionalities. The following method defined in the Scan class is used to set an instance of the filter:

  • setFilter(Filter filter): Apply the specified server-side filter when performing the query

Typically, filters can be categorized into multiple types, as follows:

  • Utility filters

  • Comparison filters

  • Custom filters

Utility filters...

Summary


In this chapter, we learned about the advanced data modeling concepts such as understanding keys in HBase. We learned the basics of table scanning in HBase and the types of available filters. We also covered the application of these filters for table scan operations using examples.

In the next chapter, we will take a look at HBase storage and the replication architecture and also cover HBase over MapReduce in detail.

lock icon The rest of the chapter is locked
You have been reading a chapter from
HBase Essentials
Published in: Nov 2014 Publisher: ISBN-13: 9781783987245
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Constructor

Description

Scan()

The default scan constructor reads the entire HBase table, including all the column families and the respective columns

Scan(byte[] startRow)

Creates a scan operation starting at the specified row

Scan(byte[] startRow, byte[] stopRow)

Creates a scan operation for the range of rows specified...