Hadoop Operations and Cluster Management Cookbook

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster

Hadoop Operations and Cluster Management Cookbook

Cookbook
Shumin Guo

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook
$29.99 p/month

Subscribe Now

Download this title FREE and instantly access over 3,500 courses today!

With unlimited access to a constantly growing library of over 3,500 courses, a subscription to Mapt gives you everything you need to get that next promotion or to land that dream job. Cancel anytime.

Code Files
+ Collection
Free Sample

Book Details

ISBN 139781782165163
Paperback368 pages

Book Description

We are facing an avalanche of data. The unstructured data we gather can contain many insights that could hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop is one of the most highly sought after skills in today's job market. Hadoop, by combining the computing and storage powers of a large number of commodity machines, solves this problem in an elegant way!

Hadoop Operations and Cluster Management Cookbook is a practical and hands-on guide for designing and managing a Hadoop cluster. It will help you understand how Hadoop works and guide you through cluster management tasks.

This book explains real-world, big data problems and the features of Hadoop that enables it to handle such problems. It breaks down the mystery of a Hadoop cluster and will guide you through a number of clear, practical recipes that will help you to manage a Hadoop cluster.

We will start by installing and configuring a Hadoop cluster, while explaining hardware selection and networking considerations. We will also cover the topic of securing a Hadoop cluster with Kerberos, configuring cluster high availability and monitoring a cluster. And if you want to know how to build a Hadoop cluster on the Amazon EC2 cloud, then this is a book for you.

Table of Contents

Chapter 1: Big Data and Hadoop
Introduction
Defining a Big Data problem
Building a Hadoop-based Big Data platform
Choosing from Hadoop alternatives
Chapter 2: Preparing for Hadoop Installation
Introduction
Choosing hardware for cluster nodes
Designing the cluster network
Configuring the cluster administrator machine
Creating the kickstart file and boot media
Installing the Linux operating system
Installing Java and other tools
Configuring SSH
Chapter 3: Configuring a Hadoop Cluster
Introduction
Choosing a Hadoop version
Configuring Hadoop in pseudo-distributed mode
Configuring Hadoop in fully-distributed mode
Validating Hadoop installation
Configuring ZooKeeper
Installing HBase
Installing Hive
Installing Pig
Installing Mahout
Chapter 4: Managing a Hadoop Cluster
Introduction
Managing the HDFS cluster
Configuring SecondaryNameNode
Managing the MapReduce cluster
Managing TaskTracker
Decommissioning DataNode
Replacing a slave node
Managing MapReduce jobs
Checking job history from the web UI
Importing data to HDFS
Manipulating files on HDFS
Configuring the HDFS quota
Configuring CapacityScheduler
Configuring Fair Scheduler
Configuring Hadoop daemon logging
Configuring Hadoop audit logging
Upgrading Hadoop
Chapter 5: Hardening a Hadoop Cluster
Introduction
Configuring service-level authentication
Configuring job authorization with ACL
Securing a Hadoop cluster with Kerberos
Configuring web UI authentication
Recovering from NameNode failure
Configuring NameNode high availability
Configuring HDFS federation
Chapter 6: Monitoring a Hadoop Cluster
Introduction
Monitoring a Hadoop cluster with JMX
Monitoring a Hadoop cluster with Ganglia
Monitoring a Hadoop cluster with Nagios
Monitoring a Hadoop cluster with Ambari
Monitoring a Hadoop cluster with Chukwa
Chapter 7: Tuning a Hadoop Cluster for Best Performance
Introduction
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Choosing a proper block size
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Tuning JVM parameters
Configuring JVM Reuse
Configuring the reducer initialization time
Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3
Introduction
Registering with Amazon Web Services (AWS)
Managing AWS security credentials
Preparing a local machine for EC2 connection
Creating an Amazon Machine Image (AMI)
Using S3 to host data
Configuring a Hadoop cluster with the new AMI

What You Will Learn

  • Defining your big data problem
  • Designing and configuring a pseudo-distributed Hadoop cluster
  • Configuring a fully distributed Hadoop cluster and tuning your Hadoop cluster for better performance
  • Managing the DFS and MapReduce cluster
  • Configuring Hadoop logging, auditing, and job scheduling
  • Hardening the Hadoop cluster with security and access control methods
  • Monitoring a Hadoop cluster with tools such as Chukwa, Ganglia, Nagio, and Ambari
  • Setting up a Hadoop cluster on the Amazon cloud

Authors

Table of Contents

Chapter 1: Big Data and Hadoop
Introduction
Defining a Big Data problem
Building a Hadoop-based Big Data platform
Choosing from Hadoop alternatives
Chapter 2: Preparing for Hadoop Installation
Introduction
Choosing hardware for cluster nodes
Designing the cluster network
Configuring the cluster administrator machine
Creating the kickstart file and boot media
Installing the Linux operating system
Installing Java and other tools
Configuring SSH
Chapter 3: Configuring a Hadoop Cluster
Introduction
Choosing a Hadoop version
Configuring Hadoop in pseudo-distributed mode
Configuring Hadoop in fully-distributed mode
Validating Hadoop installation
Configuring ZooKeeper
Installing HBase
Installing Hive
Installing Pig
Installing Mahout
Chapter 4: Managing a Hadoop Cluster
Introduction
Managing the HDFS cluster
Configuring SecondaryNameNode
Managing the MapReduce cluster
Managing TaskTracker
Decommissioning DataNode
Replacing a slave node
Managing MapReduce jobs
Checking job history from the web UI
Importing data to HDFS
Manipulating files on HDFS
Configuring the HDFS quota
Configuring CapacityScheduler
Configuring Fair Scheduler
Configuring Hadoop daemon logging
Configuring Hadoop audit logging
Upgrading Hadoop
Chapter 5: Hardening a Hadoop Cluster
Introduction
Configuring service-level authentication
Configuring job authorization with ACL
Securing a Hadoop cluster with Kerberos
Configuring web UI authentication
Recovering from NameNode failure
Configuring NameNode high availability
Configuring HDFS federation
Chapter 6: Monitoring a Hadoop Cluster
Introduction
Monitoring a Hadoop cluster with JMX
Monitoring a Hadoop cluster with Ganglia
Monitoring a Hadoop cluster with Nagios
Monitoring a Hadoop cluster with Ambari
Monitoring a Hadoop cluster with Chukwa
Chapter 7: Tuning a Hadoop Cluster for Best Performance
Introduction
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Choosing a proper block size
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Tuning JVM parameters
Configuring JVM Reuse
Configuring the reducer initialization time
Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3
Introduction
Registering with Amazon Web Services (AWS)
Managing AWS security credentials
Preparing a local machine for EC2 connection
Creating an Amazon Machine Image (AMI)
Using S3 to host data
Configuring a Hadoop cluster with the new AMI

Book Details

ISBN 139781782165163
Paperback368 pages
Read More

Read More Reviews