Free Sample
+ Collection
Code Files

Learning Cloudera Impala

Avkash Chauhan

Everything you need to know about Cloudera Impala is here – from installation onwards. Your raw data processing in Hadoop takes on new dimensions of speed and volume with this hands-on tutorial.
RRP $20.99
RRP $34.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783281275
Paperback150 pages

About This Book

  • Step-by-step guidance to get you started with Impala on your Hadoop cluster
  • Manipulate your data rapidly by writing proper SQL statements
  • Explore the concepts of Impala security, administration, and troubleshooting in detail to maintain your Impala cluster

Who This Book Is For

Using Cloudera Impala is for those who really want to take advantage of their Hadoop cluster by processing extremely large amounts of raw data in Hadoop at real-time speed. Prior knowledge of Hadoop and some exposure to HIVE and MapReduce is expected.

Table of Contents

Chapter 1: Getting Started with Impala
Impala requirements
Installing Impala
Configuring Impala after installation
Starting Impala
Stopping Impala
Restarting Impala
Upgrading Impala
Impala core components
The Impala execution architecture
Impala security
Impala security guidelines for a higher level of protection
Chapter 2: The Impala Shell Commands and Interface
Using Cloudera Manager for Impala
Launching Impala shell
Connecting impala-shell to the remotely located impalad daemon
Impala-shell command-line options with brief explanations
Impala-shell command reference
Chapter 3: The Impala Query Language and Built-in Functions
Impala SQL language statements
Data types
Query-specific SQL statements in Impala
Defining VIEWS in Impala
Loading data from HDFS using the LOAD DATA statement
Comments in Impala SQL statements
Built-in function support in Impala
Unsupported SQL statements in Impala
Chapter 4: Impala Walkthrough with an Example
Creating an example scenario
Commands for loading data into Impala tables
Launching the Impala shell
SQL queries against the example database
SQL join operation with the example database
Chapter 5: Impala Administration and Performance Improvements
Impala administration
Impala High Availability
Single point of failure in Impala
Improving performance
Testing query performance
Choosing an appropriate file format and compression type for better performance
Fine-tuning Impala performance
Chapter 6: Troubleshooting Impala
Troubleshooting various problems
Using Cloudera Manager to troubleshoot problems
Chapter 7: Advanced Impala Concepts
Impala and MapReduce
Impala and Hive
Impala and Extract, Transform, Load (ETL)
Why Impala is faster than Hive in query processing
Impala processing strategy
Impala and HBase
File formats and compression types supported in Impala
Processing different file and compression types in Impala
The unsupported features in Impala
Impala resources

What You Will Learn

  • Understand the various ways of installing Impala in your Hadoop cluster
  • Use the Impala shell API to interact with Impala components
  • Utilize Impala Query Language and built-in functions to play with data
  • Administrate and fine-tune Impala for high availability
  • Identify and troubleshoot problems in a variety of ways
  • Get acquainted with various input data formats in Hadoop and how to use them with Impala
  • Comprehend how third party applications can connect with Impala to provide data visualization and various other enhancements

In Detail

If you have always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, then Cloudera Impala is the number one choice for you. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.

In this practical, example-oriented book, you will learn everything you need to know about Cloudera Impala so that you can get started on your very own project. The book covers everything about Cloudera Impala from installation, administration, and query processing, all the way to connectivity with other third party applications. With this book in your hand, you will find yourself empowered to play with your data in Hadoop.

As a reader of this book, you will learn about the origin of Impala and the technology behind it that allows it to run on thousands of machines. You will learn how to install, run, manage, and troubleshoot Impala in your own Hadoop cluster using the step-by-step guidance provided in the book. The book covers tenets of data processing such as loading data stored in Hadoop into Impala tables and querying data using Impala SQL statements, all with various code illustrations and a real-world example.

The book is written to get you started with Impala by providing rich information so you can understand what Impala is, what it can do for you, and finally how you can use it to achieve your objective.


Read More