Learning HBase

More Information
Learn
  • Understand the fundamentals of HBase
  • Understand the prerequisites necessary to get started with HBase
  • Install and configure a new HBase cluster
  • Optimize an HBase cluster using different Hadoop and HBase parameters
  • Make clusters more reliable using different troubleshooting and maintenance techniques
  • Get to grips with the HBase data model and its operations
  • Get to know the benefits of using Hadoop tools/JARs for HBase
About

Apache HBase is a nonrelational NoSQL database management system that runs on top of HDFS. It is an open source, distributed, versioned, column-oriented store. It facilitates the tech industry with random, real-time read/write access to your Big Data with the benefit of linear scalability on the fly.

This book will take you through a series of core tasks in HBase. The introductory chapter will give you all the information you need about the HBase ecosystem. Furthermore, you'll learn how to configure, create, verify, and test clusters. The book also explores different parameters of Hadoop and HBase that need to be considered for optimization and a trouble-free operation of the cluster. It will focus more on HBase's data model, storage, and structure layout. You will also get to know the different options that can be used to speed up the operation and functioning of HBase. The book will also teach the users basic- and advance-level coding in Java for HBase. By the end of the book, you will have learned how to use HBase with large data sets and integrate them with Hadoop.

Features
  • Learn how HBase works with large data sets and integrates them with Hadoop
  • Understand the layout and structure of HBase
  • A step-by-step guide accompanied by practical examples that will focus on the core tasks of HBase
Page Count 326
Course Length 9 hours 46 minutes
ISBN 9781783985944
Date Of Publication 25 Nov 2014

Authors

Shashwat Shriparv

Shashwat Shriparv was born in Muzaffarpur, Bihar. He did his schooling from Muzaffarpur and Shillong, Meghalaya. He received his BCA degree from IGNOU, Delhi and his MCA degree from Cochin University of Science and Technology, Kerala (C-DAC Trivandrum).

He was introduced to Big Data technologies in early 2010 when he was asked to perform a proof of concept (POC) on Big Data technologies in storing and processing logs. He was also given another project, where he was required to store huge binary files with variable headers and process them. At this time, he started configuring, setting up, and testing Hadoop HBase clusters and writing sample code for them. After performing a successful POC, he initiated serious development using Java REST and SOAP web services, building a system to store and process logs to Hadoop using web services, and then storing these logs in HBase using homemade schema and reading data using HBase APIs and HBase-Hive mapped queries. Shashwat successfully implemented the project, and then moved on to work on huge binary files of size 1 to 3 TB, processing the header and storing metadata to HBase and files on HDFS.

Shashwat started his career as a software developer at C-DAC Cyber Forensics, Trivandrum, building mobile-related software for forensics analysis. Then, he moved to Genilok Computer Solutions, where he worked on cluster computing, HPC technologies, and web technologies. After this, he moved to Bangalore from Trivandrum and joined PointCross, where he started working with Big Data technologies, developing software using Java, web services, and platform as Big Data. He worked on many projects revolving around Big Data technologies, such as Hadoop, HBase, Hive, Pig, Sqoop, Flume, and so on at PointCross. From here, he moved to HCL Infosystems Ltd. to work on the UIDAI project, which is one of the most prestigious projects in India, providing a unique identification number to every resident of India. Here, he worked on technologies such as HBase, Hive, Hadoop, Pig, and Linux, scripting, managing HBase Hadoop clusters, writing scripts, automating tasks and processes, and building dashboards for monitoring clusters.

Currently, he is working with Cognilytics, Inc. on Big Data technologies, HANA, and other high-performance technologies.

You can find out more about him at https://github.com/shriparv and http://helpmetocode.blogspot.com. You can connect with him on LinkedIn at http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9. You can also e-mail him at dwivedishashwat@gmail.com.

Shashwat has worked as a reviewer on the book Pig Design Pattern, Pradeep Pasupuleti, Packt Publishing. He also contributed to his college magazine, InfinityTech, as an editor.