Learning Cloudera Impala


Learning Cloudera Impala
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
$17.85
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
$52.84
save 6%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Step-by-step guidance to get you started with Impala on your Hadoop cluster
  • Manipulate your data rapidly by writing proper SQL statements
  • Explore the concepts of Impala security, administration, and troubleshooting in detail to maintain your Impala cluster

Book Details

Language : English
Paperback : 150 pages [ 235mm x 191mm ]
Release Date : December 2013
ISBN : 1783281278
ISBN 13 : 9781783281275
Author(s) : Avkash Chauhan
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source


Table of Contents

Preface
Chapter 1: Getting Started with Impala
Chapter 2: The Impala Shell Commands and Interface
Chapter 3: The Impala Query Language and Built-in Functions
Chapter 4: Impala Walkthrough with an Example
Chapter 5: Impala Administration and Performance Improvements
Chapter 6: Troubleshooting Impala
Chapter 7: Advanced Impala Concepts
Appendix: Technology Behind Impala and Integration with Third-party Applications
Index
  • Chapter 1: Getting Started with Impala
    • Impala requirements
      • Dependency on Hive for Impala
      • Dependency on Java for Impala
      • Hardware dependency
      • Networking requirements
      • User account requirements
    • Installing Impala
      • Installing Impala with Cloudera Manager
      • Installing Impala without Cloudera Manager
    • Configuring Impala after installation
    • Starting Impala
    • Stopping Impala
    • Restarting Impala
    • Upgrading Impala
      • Upgrading Impala using parcels with Cloudera Manager
      • Upgrading Impala using packages with Cloudera Manager
      • Upgrading Impala without Cloudera Manager
    • Impala core components
      • Impala daemon
      • Impala statestore
      • Impala metadata and metastore
      • The Impala programming interface
    • The Impala execution architecture
      • Working with Apache Hive
      • Working with HDFS
      • Working with HBase
    • Impala security
      • Authorization
        • The SELECT privilege
        • The INSERT privilege
        • The ALL privilege
      • Authentication through Kerberos
      • Auditing
    • Impala security guidelines for a higher level of protection
    • Summary
  • Chapter 2: The Impala Shell Commands and Interface
    • Using Cloudera Manager for Impala
    • Launching Impala shell
    • Connecting impala-shell to the remotely located impalad daemon
    • Impala-shell command-line options with brief explanations
      • General command-line options
      • Connection-specific options
      • Query-specific options
      • Secure connectivity-specific options
    • Impala-shell command reference
      • General commands
      • Query-specific commands
      • Table- and database-specific commands
    • Summary
  • Chapter 3: The Impala Query Language and Built-in Functions
    • Impala SQL language statements
      • Database-specific statements
        • The CREATE DATABASE statement
        • The DROP DATABASE statement
        • The SHOW DATABASES statement
        • Using database-specific query sentence in an example
      • Table-specific statements
        • The CREATE TABLE statement
        • The CREATE EXTERNAL TABLE statement
        • The ALTER TABLE statement
        • The DROP TABLE statement
        • The SHOW TABLES statement
        • The DESCRIBE statement
        • The INSERT statement
        • The SELECT statement
        • Internal and external tables
    • Data types
    • Operators
    • Functions
    • Clauses
    • Query-specific SQL statements in Impala
    • Defining VIEWS in Impala
    • Loading data from HDFS using the LOAD DATA statement
    • Comments in Impala SQL statements
    • Built-in function support in Impala
      • The type conversion function
    • Unsupported SQL statements in Impala
    • Summary
  • Chapter 4: Impala Walkthrough with an Example
    • Creating an example scenario
      • Example dataset one – automobiles (automobiles.txt)
      • Example dataset two – motorcycles (motorcycles.txt)
      • Data and schema considerations
    • Commands for loading data into Impala tables
      • HDFS specific commands
      • Loading data into the Impala table from HDFS
    • Launching the Impala shell
      • Database and table specific commands
    • SQL queries against the example database
    • SQL join operation with the example database
      • Using various types of SQL statements
    • Summary
  • Chapter 5: Impala Administration and Performance Improvements
    • Impala administration
      • Administration with Cloudera Manager
      • The Impala statestore UI
    • Impala High Availability
    • Single point of failure in Impala
    • Improving performance
      • Enabling block location tracking
      • Enabling native checksumming
      • Enabling Impala to perform short-circuit read on DataNode
      • Adding more Impala nodes to achieve higher performance
      • Optimizing memory usage during query execution
      • Query execution dependency on memory
      • Using resource isolation
    • Testing query performance
      • Benchmarking queries
      • Verifying data locality
    • Choosing an appropriate file format and compression type for better performance
    • Fine-tuning Impala performance
      • Partitioning
      • Join queries
      • Table and column statistics
    • Summary
  • Chapter 6: Troubleshooting Impala
    • Troubleshooting various problems
      • Impala configuration-related issues
        • The block locality issue
        • Native checksumming issues
      • Various connectivity issues
        • Connectivity between Impala shell and Impala daemon
        • ODBC/JDBC-specific connectivity issues
      • Query-specific issues
      • Issues specific to User Access Control (UAC)
      • Platform-specific issues
        • Impala port mapping issues
        • HDFS-specific problems
      • Input file format-specific issues
    • Using Cloudera Manager to troubleshoot problems
      • Impala log analysis using Cloudera Manager
      • Using the Impala web interface for monitoring and troubleshooting
      • Using the Impala statestore web interface
      • Using the Impala Maintenance Mode
      • Checking Impala events
    • Summary
  • Chapter 7: Advanced Impala Concepts
    • Impala and MapReduce
    • Impala and Hive
      • Key differences between Impala and Hive
    • Impala and Extract, Transform, Load (ETL)
    • Why Impala is faster than Hive in query processing
    • Impala processing strategy
    • Impala and HBase
      • Using Impala to query HBase tables
    • File formats and compression types supported in Impala
    • Processing different file and compression types in Impala
      • The regular text file format with Impala tables
      • The Avro file format with Impala tables
      • The RCFile file format with Impala tables
      • The SequenceFile file format with Impala tables
      • The Parquet file format with Impala tables
    • The unsupported features in Impala
    • Impala resources
    • Summary

Avkash Chauhan

Avkash Chauhan is a software technology veteran with more than 12 years of industry experience in various disciplines such as embedded engineering, cloud computing, big data analytics, data processing, and data visualization. He has an extensive global work experience with Fortune 100 companies worldwide. He has spent the last eight years at Microsoft before moving on to Silicon Valley to work with a big data and analytics start-up. He started his career as an embedded engineer; and during his eight-year long gig at Microsoft, he worked on Windows CE, Windows Phone, Windows Azure, and HDInsight. He spent several years working with the Windows Azure team to develop world-class cloud technology, and his last project was Apache Hadoop on Windows Azure, also known as HDInsight. He worked on the HDInsight project since its incubation at Microsoft, and helped its early development and then deployment on cloud. For the past three years, he has been working on big data- and Hadoop-related technologies by developing applications to make Hadoop easy to use for large- and mid-market companies. He is a prolific blogger and very active on the social networking sites. You can directly contact him through the following:

  • LinkedIn: https://www.linkedin.com/in/avkashchauhan
  • Blog: http://cloudcelebrity.wordpress.com/
  • Twitter: @avkashchauhan
Sorry, we don't have any reviews for this title yet.

Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Learning Cloudera Impala +    MediaWiki 1.1 Beginner's Guide =
50% Off
the second eBook
Price for both: $30.90

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Understand the various ways of installing Impala in your Hadoop cluster
  • Use the Impala shell API to interact with Impala components
  • Utilize Impala Query Language and built-in functions to play with data
  • Administrate and fine-tune Impala for high availability
  • Identify and troubleshoot problems in a variety of ways
  • Get acquainted with various input data formats in Hadoop and how to use them with Impala
  • Comprehend how third party applications can connect with Impala to provide data visualization and various other enhancements

In Detail

If you have always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, then Cloudera Impala is the number one choice for you. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.

In this practical, example-oriented book, you will learn everything you need to know about Cloudera Impala so that you can get started on your very own project. The book covers everything about Cloudera Impala from installation, administration, and query processing, all the way to connectivity with other third party applications. With this book in your hand, you will find yourself empowered to play with your data in Hadoop.

As a reader of this book, you will learn about the origin of Impala and the technology behind it that allows it to run on thousands of machines. You will learn how to install, run, manage, and troubleshoot Impala in your own Hadoop cluster using the step-by-step guidance provided in the book. The book covers tenets of data processing such as loading data stored in Hadoop into Impala tables and querying data using Impala SQL statements, all with various code illustrations and a real-world example.

The book is written to get you started with Impala by providing rich information so you can understand what Impala is, what it can do for you, and finally how you can use it to achieve your objective.

Approach

This book is an easy-to-follow, step-by-step tutorial where each chapter takes your knowledge to the next level. The book covers practical knowledge with tips to implement this knowledge in real-world scenarios. A chapter with a real-life example is included to help you understand the concepts in full.

Who this book is for

Using Cloudera Impala is for those who really want to take advantage of their Hadoop cluster by processing extremely large amounts of raw data in Hadoop at real-time speed. Prior knowledge of Hadoop and some exposure to HIVE and MapReduce is expected.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software