More Information
  • Understand the Apache Hadoop architecture and the future of distributed processing frameworks
  • Use HDFS and MapReduce for all file-related operations
  • Install and configure CDH to bring up an Apache Hadoop cluster
  • Configure HDFS High Availability and HDFS Federation to prevent single points of failure
  • Install and configure Cloudera Manager to perform administrator operations
  • Implement security by installing and configuring Kerberos for all services in the cluster
  • Add, remove, and rebalance nodes in a cluster using cluster management tools
  • Understand and configure the different backup options to back up your HDFS

Apache Hadoop is an open source distributed computing technology that assists users in processing large volumes of data with relative ease, helping them to generate tremendous insights into their data. Cloudera, with their open source distribution of Hadoop, has made data analytics on big data possible and accessible to anyone interested.

This book fully prepares you to be a Hadoop administrator, with special emphasis on Cloudera's CDH. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running CDH5. This book will also equip you with an understanding of tools such as Cloudera Manager, which is currently being used by many companies to manage Hadoop clusters with hundreds of nodes. You will learn how to set up security using Kerberos. You will also use Cloudera Manager to set up alerts and events that will help you monitor and troubleshoot cluster issues.

  • Understand the CDH architecture and its components and successfully set up a Hadoop cluster
  • Maintain, troubleshoot, and secure your cluster using Cloudera Manager
  • Easy-to-follow administrator’s guide with step-by-step explanations to help you master Apache Hadoop
Page Count 254
Course Length 7 hours 37 minutes
ISBN 9781783558964
Date Of Publication 17 Jul 2014


Rohit Menon

Rohit Menon is a senior system analyst living in Denver, Colorado. He has over 7 years of experience in the field of Information Technology, which started with the role of a real-time applications developer back in 2006. He now works for a product-based company specializing in software for large telecom operators.

He graduated with a master's degree in Computer Applications from University of Pune, where he built an autonomous maze-solving robot as his final year project. He later joined a software consulting company in India where he worked on C#, SQL Server, C++, and RTOS to provide software solutions to reputable organizations in USA and Japan. After this, he started working for a product-based company where most of his time was dedicated to programming the finer details of products using C++, Oracle, Linux, and Java.

He is a person who always likes to learn new technologies and this got him interested in web application development. He picked up Ruby, Ruby on Rails, HTML, JavaScript, CSS, and built, a Netflix search engine that makes searching for titles on Netflix much easier.

On the Hadoop front, he is a Cloudera Certified Apache Hadoop Developer. He blogs at, mainly on topics related to Apache Hadoop and its components. To share his learning, he has also started, a website that teaches Apache Hadoop using simple, short, and easy-to-follow screencasts. He is well versed with wide variety of tools and techniques such as MapReduce, Hive, Pig, Sqoop, Oozie, and Talend Open Studio.