Reader small image

You're reading from  HBase Essentials

Product typeBook
Published inNov 2014
Reading LevelIntermediate
Publisher
ISBN-139781783987245
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Nishant Garg
Nishant Garg
author image
Nishant Garg

Nishant Garg has over 17 years' software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, Shark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum). He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a technical architect for the Big Data RandD Group with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of the most recognizable names in IT services and financial industries, employing full software life cycle methodologies such as Agile and SCRUM. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Apache Kafka and HBase Essentials, Packt Publishing.
Read more about Nishant Garg

Right arrow

Use cases of HBase


There are a number of use cases where HBase can be a storage system. This section discusses a few of the popular use cases for HBase and the well-known companies that have adopted HBase. Let's discuss the use cases first:

  • Handling content: In today's world, a variety of content is available for the users for consumption. Also, the variety of application clients, such as browser, mobile, and so on, leads to an additional requirement where each client needs the same content in different formats. Users not only consume content but also generate a variety of content in a large volume with a high velocity, such as tweets, Facebook posts, images, bloging, and many more. HBase is the perfect choice as the backend of such applications, for example, many scalable content management solutions are using HBase as their backend.

  • Handling incremental data: In many use cases, trickled data is added to a data store for further usage, such as analytics, processing, and serving. This trickled data could be coming from an advertisement's impressions such as clickstreams and user interaction data or it can be time series data. HBase is used for storage in all such cases. For example, Open Time Series Database (OpenTSDB) uses HBase for data storage and metrics generation. The counters feature (discussed in Chapter 5, The HBase Advanced API) is used by Facebook for counting and storing the "likes" for a particular page/image/post.

Some of the companies that are using HBase in their respective use cases are as follows:

  • Facebook (www.facebook.com): Facebook is using HBase to power its message infrastructure. Facebook opted for HBase to scale from their old messages infrastructure which handled over 350 million users, sending over 15 billion person-to-person messages per month. HBase was selected due to the excellent scalability and performance for big workloads, along with autoload balancing and failover features and so on. Facebook also uses HBase for counting and storing the "likes" contributed by users.

  • Meetup (www.meetup.com): Meetup uses HBase to power a site-wide, real-time activity feed system for all of its members and groups. In its architecture, group activity is written directly to HBase and indexed per member, with the member's custom feed served directly from HBase for incoming requests.

  • Twitter (www.twitter.com): Twitter uses HBase to provide a distributed, read/write backup of all the transactional tables in Twitter's production backend. Later, this backup is used to run MapReduce jobs over the data. Additionally, its operations team uses HBase as a time series database for cluster-wide monitoring / performance data.

  • Yahoo (www.yahoo.com): Yahoo uses HBase to store document fingerprints for detecting near-duplications. With millions of rows in the HBase table, Yahoo runs a query for finding duplicated documents with real-time traffic.

Tip

The source for the preceding mentioned information is http://wiki.apache.org/hadoop/Hbase/PoweredBy.

Previous PageNext Page
You have been reading a chapter from
HBase Essentials
Published in: Nov 2014Publisher: ISBN-13: 9781783987245
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Nishant Garg

Nishant Garg has over 17 years' software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, Shark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum). He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a technical architect for the Big Data RandD Group with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of the most recognizable names in IT services and financial industries, employing full software life cycle methodologies such as Agile and SCRUM. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Apache Kafka and HBase Essentials, Packt Publishing.
Read more about Nishant Garg