More Information
  • Start with the basics of loading data and writing your first query
  • Use de-normalized data efficiently by manipulating complex data types
  • Structure your data and queries to take advantage of Hive’s optimizations
  • Bring your own data files to Hive and teach Hive how to understand them
  • Access the specialized functions built-in to Hive to manipulate your data
  • Use Hive streaming to integrate code written in any language into your Extend Hive with user-defined functions

Hadoop provides a robust framework for building distributed applications, but working directly with Hadoop requires writing a lot of code. Adding structure to data and using a higher-level language such as SQL makes working with Hadoop both easier and faster.

"Instant Apache Hive Essentials How-to" contains a series of practical recipes that introduce the power and flexibility of Hive. Starting with your first query, this book will provide step-by-step instructions and behind-the-scenes explanations for how to effectively write MapReduce jobs with SQL.

This book looks at how Hive transforms SQL statements into MapReduce jobs and demonstrates how you can extend Hive to support your own use cases. Its recipes will teach you how to leverage the scale of Hadoop while retaining the benefits of using a structured query language.You will learn how Hive translates a query into MapReduce jobs and explore how to structure your queries for better performance. You will extend Hive to understand your own file formats, simplifying the loading of data into the warehouse. You will finally add your own custom functions to Hive to support whatever use cases you may have.

"Instant Apache Hive Essentials How-to" is a quick introduction for adding Hive to your data toolkit. It is packed with high-level instructions for making Hive work as well as drawing connections to the underlying Hadoop framework to explain how things happen.

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results
  • Learn to use SQL to write Hadoop jobs
  • Add support for data to Hive in your own file formats
  • Understand how the Hive query processor works to optimize common queries
Page Count 76
Course Length 2 hours 16 minutes
ISBN 9781782169482
Date Of Publication 2 Jun 2013


Darren Lee

Darren Lee studied mathematics at Stanford and New York University. He is currently the lead engineer on the data team at Bizo in San Francisco, CA. He has been using Hive since 2009 and occasionally blogs about it at