Cloudera Administration Handbook

Cloudera Administration Handbook
eBook: $32.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $87.98    Print cover: $54.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • Understand the CDH architecture and its components and successfully set up a Hadoop cluster
  • Maintain, troubleshoot, and secure your cluster using Cloudera Manager
  • Easy-to-follow administrator’s guide with step-by-step explanations to help you master Apache Hadoop

Book Details

Language : English
Paperback : 254 pages [ 235mm x 191mm ]
Release Date : July 2014
ISBN : 1783558962
ISBN 13 : 9781783558964
Author(s) : Rohit Menon
Topics and Technologies : All Books, Big Data and Business Intelligence, Other

Table of Contents

Chapter 1: Getting Started with Apache Hadoop
Chapter 2: HDFS and MapReduce
Chapter 3: Cloudera's Distribution Including Apache Hadoop
Chapter 4: Exploring HDFS Federation and Its High Availability
Chapter 5: Using Cloudera Manager
Chapter 6: Implementing Security Using Kerberos
Chapter 7: Managing an Apache Hadoop Cluster
Chapter 8: Cluster Monitoring Using Events and Alerts
Chapter 9: Configuring Backups
  • Chapter 1: Getting Started with Apache Hadoop
    • History of Apache Hadoop and its trends
    • Components of Apache Hadoop
    • Understanding the Apache Hadoop daemons
      • Namenode
      • Secondary namenode
      • Jobtracker
      • Tasktracker
      • ResourceManager
      • NodeManager
      • Job submission in YARN
    • Introducing Cloudera
    • Introducing CDH
    • Responsibilities of a Hadoop administrator
    • Summary
  • Chapter 2: HDFS and MapReduce
    • Essentials of HDFS
      • Configuring HDFS
    • The read/write operational flow in HDFS
      • Writing files in HDFS
      • Reading files in HDFS
    • Understanding the namenode UI
    • Understanding the secondary namenode UI
    • Exploring HDFS commands
      • Commonly used HDFS commands
      • Commands to administer HDFS
    • Getting acquainted with MapReduce
      • Understanding the map phase
      • Understanding the reduce phase
      • Learning all about the MapReduce job flow
        • Configuring MapReduce
      • Understanding the jobtracker UI
      • Getting MapReduce job information
    • Summary
  • Chapter 3: Cloudera's Distribution Including Apache Hadoop
    • Getting started with CDH
    • Understanding the CDH components
      • Apache Hadoop
      • Apache Flume NG
      • Apache Sqoop
      • Apache Pig
      • Apache Hive
      • Apache ZooKeeper
      • Apache HBase
      • Apache Whirr
      • Snappy – previously known as Zippy
      • Apache Mahout
      • Apache Avro
      • Apache Oozie
      • Cloudera Search
      • Cloudera Impala
      • Cloudera Hue
        • Beeswax – Hive UI
        • Cloudera Impala UI
        • Pig UI
        • File Browser
        • Metastore Manager
        • Sqoop Jobs
        • Job Browser
        • Job Designs
        • Dashboard
        • Collection Manager
        • Hue Shell
        • HBase Browser
    • Installing CDH
      • Stopping Hadoop services
      • Understanding a YARN cluster
    • Installing the CDH components
      • Installing Apache Flume
      • Installing Apache Sqoop
      • Installing Apache Sqoop 2
      • Installing Apache Pig
      • Installing Apache Hive
      • Installing Apache Oozie
      • Installing Apache ZooKeeper
    • Summary
  • Chapter 4: Exploring HDFS Federation and Its High Availability
    • Implementing HDFS Federation
      • Configuring HDFS Federation
        • Configuring ViewFS for a federated HDFS
    • Implementing HDFS High Availability
      • The Quorum-based storage
        • Configuring HDFS high availability by the Quorum-based storage
      • Shared storage using NFS
        • Configuring HDFS high availability by shared storage using NFS
      • Configuring automatic failover for HDFS high availability
    • Jobtracker high availability
      • Configuring jobtracker high availability
      • Configuring automatic failover for jobtracker high availability
    • Summary
  • Chapter 5: Using Cloudera Manager
    • Introducing Cloudera Manager
    • Understanding the Cloudera Manager architecture
    • Installing Cloudera Manager
    • Navigating the Cloudera Manager Web console
      • Navigating the Home screen
      • Navigating the Clusters menu
      • Exploring the Hosts menu
      • Understanding the Diagnostics menu
      • Understanding the Audits screen
      • Understanding the Charts menu
      • Understanding the Backup menu
      • Understanding the Administration menu
    • Configuring High Availability using Cloudera Manager
    • Summary
  • Chapter 6: Implementing Security Using Kerberos
    • Understanding authentication and authorization
    • Introducing Kerberos
    • Understanding the Kerberos Architecture
      • Authenticating a user
      • Accessing a secure file server
      • Understanding important Kerberos terms
    • Installing Kerberos
      • Configuring the KDC Server
      • Testing the KDC installation
      • Configuring the Kerberos clients
    • Configuring Kerberos for Apache Hadoop
      • Configuring Kerberos principal for Cloudera Manager Server
      • Configuring the Cloudera Manager Server for Kerberos
    • Authorization in Apache Hadoop
      • Configuring access control lists in Hadoop
    • Summary
  • Chapter 7: Managing an Apache Hadoop Cluster
    • Configuring Hadoop services using Cloudera Manager
      • Adding a service to the cluster
      • Removing a service from the cluster
    • Role management in Cloudera Manager
      • Adding a role instance to a host
        • Adding a DataNode role to a host
        • Adding a TaskTracker role to a host
    • Managing hosts using Cloudera Manager
      • Adding a new host
      • Removing an existing host
    • Managing multiple clusters with Cloudera Manager
    • Rebalancing a Hadoop cluster from Cloudera Manager
      • Adding the Balancer service to the cluster
      • Rebalancing the cluster
    • Summary
  • Chapter 9: Configuring Backups
    • Understanding backups
      • Types of backups
      • Types of storage media for backups
      • Using cloud services for backups
    • Understanding HDFS backups
    • Using the distributed copy (DistCp)
    • Configuring backups using Cloudera Manager
      • Configuring HDFS replication
      • Configuring Hive replication
      • Configuring snapshots
        • Enabling snapshot paths in HDFS
        • Configuring a snapshot policy
    • Summary

Rohit Menon

Rohit Menon is a senior system analyst living in Denver, Colorado. He has over 7 years of experience in the field of Information Technology, which started with the role of a real-time applications developer back in 2006. He now works for a product-based company specializing in software for large telecom operators.

He graduated with a master's degree in Computer Applications from University of Pune, where he built an autonomous maze-solving robot as his final year project. He later joined a software consulting company in India where he worked on C#, SQL Server, C++, and RTOS to provide software solutions to reputable organizations in USA and Japan. After this, he started working for a product-based company where most of his time was dedicated to programming the finer details of products using C++, Oracle, Linux, and Java.

He is a person who always likes to learn new technologies and this got him interested in web application development. He picked up Ruby, Ruby on Rails, HTML, JavaScript, CSS, and built, a Netflix search engine that makes searching for titles on Netflix much easier.

On the Hadoop front, he is a Cloudera Certified Apache Hadoop Developer. He blogs at, mainly on topics related to Apache Hadoop and its components. To share his learning, he has also started, a website that teaches Apache Hadoop using simple, short, and easy-to-follow screencasts. He is well versed with wide variety of tools and techniques such as MapReduce, Hive, Pig, Sqoop, Oozie, and Talend Open Studio.

Sorry, we don't have any reviews for this title yet.

Code Downloads

Download the code and support files for this book.

Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Cloudera Administration Handbook +    Xcode 4 Cookbook =
50% Off
the second eBook
Price for both: $47.10

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Understand the Apache Hadoop architecture and the future of distributed processing frameworks
  • Use HDFS and MapReduce for all file-related operations
  • Install and configure CDH to bring up an Apache Hadoop cluster
  • Configure HDFS High Availability and HDFS Federation to prevent single points of failure
  • Install and configure Cloudera Manager to perform administrator operations
  • Implement security by installing and configuring Kerberos for all services in the cluster
  • Add, remove, and rebalance nodes in a cluster using cluster management tools
  • Understand and configure the different backup options to back up your HDFS

In Detail

Apache Hadoop is an open source distributed computing technology that assists users in processing large volumes of data with relative ease, helping them to generate tremendous insights into their data. Cloudera, with their open source distribution of Hadoop, has made data analytics on big data possible and accessible to anyone interested.

This book fully prepares you to be a Hadoop administrator, with special emphasis on Cloudera's CDH. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running CDH5. This book will also equip you with an understanding of tools such as Cloudera Manager, which is currently being used by many companies to manage Hadoop clusters with hundreds of nodes. You will learn how to set up security using Kerberos. You will also use Cloudera Manager to set up alerts and events that will help you monitor and troubleshoot cluster issues.


An easy-to-follow Apache Hadoop administrator’s guide filled with practical screenshots and explanations for each step and configuration.

Who this book is for

This book is great for administrators interested in setting up and managing a large Hadoop cluster. If you are an administrator, or want to be an administrator, and you are ready to build and maintain a production-level cluster running CDH5, then this book is for you.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software