Instant Apache Hive Essentials How-to [Instant]


This title is available as an eBook only
Instant Apache Hive Essentials How-to [Instant]
eBook: $14.99
Formats: PDF, PacktLib, ePub and Mobi formats
$12.74
save 15%!
Print & eBook also available on:
Learn in an Instant - Short, Fast, Focused
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results
  • Learn to use SQL to write Hadoop jobs
  • Add support for data to Hive in your own file formats
  • Understand how the Hive query processor works to optimize common queries

Book Details

Language : English
eBook : 76 pages
Release Date : June 2013
ISBN : 1782169474
ISBN 13 : 9781782169475
Author(s) : Darren Lee
Topics and Technologies : All Books, Big Data and Business Intelligence, Instant, Open Source

Table of Contents

Preface
Instant Apache Hive Essentials How-to
  • Instant Apache Hive Essentials How-to
    • Tables and queries (Simple)
    • Understanding complex data types (Simple)
    • Using Hive non-interactively (Simple)
    • Join optimizations (Medium)
    • Setting the file format (Simple)
    • Writing a custom SerDe (Intermediate)
    • Using static partitions (Intermediate)
    • Using dynamic partitions (Intermediate)
    • Using functions (Simple)
    • Adding custom logic with streaming (Intermediate)
    • Simple user-defined functions (Intermediate)
    • Advanced user-defined functions (Advanced)
    • User-defined table-generating functions (Advanced)
    • User-defined aggregation functions (Advanced)

Darren Lee

Darren Lee studied mathematics at Stanford and New York University. He is currently the lead engineer on the data team at Bizo in San Francisco, CA. He has been using Hive since 2009 and occasionally blogs about it at http://dev.bizo.com.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


Errata

- 2 submitted: last submission 29 May 2014

Errata type: Code | Page number: 9

Code in Bullet point 6

> num_bidsint) ;

Should be: > num_bids int) ;

Errate Type: Technical | Page number: Preface

Link to the Olympic TSV data dump in the preface.

The data for the exercises is available as part of the code download from the Packt website. Please go to the book's webpage at: http://www.packtpub.com/apache-hive-essentials-how-to/book Click on the "Support" tab, then "Download Now". The Olympics data is in the /data/ subdirectory of the archive.

Sorry, there are currently no downloads available for this title.

Frequently bought together

Instant Apache Hive Essentials How-to [Instant] +    WordPress 3 For Business Bloggers =
50% Off
the second eBook
Price for both: ₨258.60

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Start with the basics of loading data and writing your first query
  • Use de-normalized data efficiently by manipulating complex data types
  • Structure your data and queries to take advantage of Hive’s optimizations
  • Bring your own data files to Hive and teach Hive how to understand them
  • Access the specialized functions built-in to Hive to manipulate your data
  • Use Hive streaming to integrate code written in any language into your Extend Hive with user-defined functions

In Detail

Hadoop provides a robust framework for building distributed applications, but working directly with Hadoop requires writing a lot of code. Adding structure to data and using a higher-level language such as SQL makes working with Hadoop both easier and faster.

"Instant Apache Hive Essentials How-to" contains a series of practical recipes that introduce the power and flexibility of Hive. Starting with your first query, this book will provide step-by-step instructions and behind-the-scenes explanations for how to effectively write MapReduce jobs with SQL.

This book looks at how Hive transforms SQL statements into MapReduce jobs and demonstrates how you can extend Hive to support your own use cases. Its recipes will teach you how to leverage the scale of Hadoop while retaining the benefits of using a structured query language.You will learn how Hive translates a query into MapReduce jobs and explore how to structure your queries for better performance. You will extend Hive to understand your own file formats, simplifying the loading of data into the warehouse. You will finally add your own custom functions to Hive to support whatever use cases you may have.

"Instant Apache Hive Essentials How-to" is a quick introduction for adding Hive to your data toolkit. It is packed with high-level instructions for making Hive work as well as drawing connections to the underlying Hadoop framework to explain how things happen.

Approach

Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks.This book provides quick recipes for using Hive to read data in various formats, efficiently querying this data, and extending Hive with any custom functions you may need to insert your own logic into the data pipeline.

Who this book is for

This book is written for data analysts and developers who want to use their current knowledge of SQL to be more productive with Hadoop. It assumes that readers are comfortable writing SQL queries and are familiar with Hadoop at the level of the classic WordCount example.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software