Hadoop Operations and Cluster Management Cookbook

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster

Hadoop Operations and Cluster Management Cookbook

Shumin Guo

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster
Mapt Subscription
FREE
$29.99/m after trial
eBook
$21.00
RRP $29.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$21.00
$49.99
$29.99p/m after trial
RRP $29.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Code Files
Preview in Mapt

Book Details

ISBN 139781782165163
Paperback368 pages

Book Description

We are facing an avalanche of data. The unstructured data we gather can contain many insights that could hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop is one of the most highly sought after skills in today's job market. Hadoop, by combining the computing and storage powers of a large number of commodity machines, solves this problem in an elegant way!

Hadoop Operations and Cluster Management Cookbook is a practical and hands-on guide for designing and managing a Hadoop cluster. It will help you understand how Hadoop works and guide you through cluster management tasks.

This book explains real-world, big data problems and the features of Hadoop that enables it to handle such problems. It breaks down the mystery of a Hadoop cluster and will guide you through a number of clear, practical recipes that will help you to manage a Hadoop cluster.

We will start by installing and configuring a Hadoop cluster, while explaining hardware selection and networking considerations. We will also cover the topic of securing a Hadoop cluster with Kerberos, configuring cluster high availability and monitoring a cluster. And if you want to know how to build a Hadoop cluster on the Amazon EC2 cloud, then this is a book for you.

Table of Contents

Chapter 1: Big Data and Hadoop
Introduction
Defining a Big Data problem
Building a Hadoop-based Big Data platform
Choosing from Hadoop alternatives
Chapter 2: Preparing for Hadoop Installation
Introduction
Choosing hardware for cluster nodes
Designing the cluster network
Configuring the cluster administrator machine
Creating the kickstart file and boot media
Installing the Linux operating system
Installing Java and other tools
Configuring SSH
Chapter 3: Configuring a Hadoop Cluster
Introduction
Choosing a Hadoop version
Configuring Hadoop in pseudo-distributed mode
Configuring Hadoop in fully-distributed mode
Validating Hadoop installation
Configuring ZooKeeper
Installing HBase
Installing Hive
Installing Pig
Installing Mahout
Chapter 4: Managing a Hadoop Cluster
Introduction
Managing the HDFS cluster
Configuring SecondaryNameNode
Managing the MapReduce cluster
Managing TaskTracker
Decommissioning DataNode
Replacing a slave node
Managing MapReduce jobs
Checking job history from the web UI
Importing data to HDFS
Manipulating files on HDFS
Configuring the HDFS quota
Configuring CapacityScheduler
Configuring Fair Scheduler
Configuring Hadoop daemon logging
Configuring Hadoop audit logging
Upgrading Hadoop
Chapter 5: Hardening a Hadoop Cluster
Introduction
Configuring service-level authentication
Configuring job authorization with ACL
Securing a Hadoop cluster with Kerberos
Configuring web UI authentication
Recovering from NameNode failure
Configuring NameNode high availability
Configuring HDFS federation
Chapter 6: Monitoring a Hadoop Cluster
Introduction
Monitoring a Hadoop cluster with JMX
Monitoring a Hadoop cluster with Ganglia
Monitoring a Hadoop cluster with Nagios
Monitoring a Hadoop cluster with Ambari
Monitoring a Hadoop cluster with Chukwa
Chapter 7: Tuning a Hadoop Cluster for Best Performance
Introduction
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Choosing a proper block size
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Tuning JVM parameters
Configuring JVM Reuse
Configuring the reducer initialization time
Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3
Introduction
Registering with Amazon Web Services (AWS)
Managing AWS security credentials
Preparing a local machine for EC2 connection
Creating an Amazon Machine Image (AMI)
Using S3 to host data
Configuring a Hadoop cluster with the new AMI

What You Will Learn

  • Defining your big data problem
  • Designing and configuring a pseudo-distributed Hadoop cluster
  • Configuring a fully distributed Hadoop cluster and tuning your Hadoop cluster for better performance
  • Managing the DFS and MapReduce cluster
  • Configuring Hadoop logging, auditing, and job scheduling
  • Hardening the Hadoop cluster with security and access control methods
  • Monitoring a Hadoop cluster with tools such as Chukwa, Ganglia, Nagio, and Ambari
  • Setting up a Hadoop cluster on the Amazon cloud

Authors

Table of Contents

Chapter 1: Big Data and Hadoop
Introduction
Defining a Big Data problem
Building a Hadoop-based Big Data platform
Choosing from Hadoop alternatives
Chapter 2: Preparing for Hadoop Installation
Introduction
Choosing hardware for cluster nodes
Designing the cluster network
Configuring the cluster administrator machine
Creating the kickstart file and boot media
Installing the Linux operating system
Installing Java and other tools
Configuring SSH
Chapter 3: Configuring a Hadoop Cluster
Introduction
Choosing a Hadoop version
Configuring Hadoop in pseudo-distributed mode
Configuring Hadoop in fully-distributed mode
Validating Hadoop installation
Configuring ZooKeeper
Installing HBase
Installing Hive
Installing Pig
Installing Mahout
Chapter 4: Managing a Hadoop Cluster
Introduction
Managing the HDFS cluster
Configuring SecondaryNameNode
Managing the MapReduce cluster
Managing TaskTracker
Decommissioning DataNode
Replacing a slave node
Managing MapReduce jobs
Checking job history from the web UI
Importing data to HDFS
Manipulating files on HDFS
Configuring the HDFS quota
Configuring CapacityScheduler
Configuring Fair Scheduler
Configuring Hadoop daemon logging
Configuring Hadoop audit logging
Upgrading Hadoop
Chapter 5: Hardening a Hadoop Cluster
Introduction
Configuring service-level authentication
Configuring job authorization with ACL
Securing a Hadoop cluster with Kerberos
Configuring web UI authentication
Recovering from NameNode failure
Configuring NameNode high availability
Configuring HDFS federation
Chapter 6: Monitoring a Hadoop Cluster
Introduction
Monitoring a Hadoop cluster with JMX
Monitoring a Hadoop cluster with Ganglia
Monitoring a Hadoop cluster with Nagios
Monitoring a Hadoop cluster with Ambari
Monitoring a Hadoop cluster with Chukwa
Chapter 7: Tuning a Hadoop Cluster for Best Performance
Introduction
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Choosing a proper block size
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Tuning JVM parameters
Configuring JVM Reuse
Configuring the reducer initialization time
Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3
Introduction
Registering with Amazon Web Services (AWS)
Managing AWS security credentials
Preparing a local machine for EC2 connection
Creating an Amazon Machine Image (AMI)
Using S3 to host data
Configuring a Hadoop cluster with the new AMI

Book Details

ISBN 139781782165163
Paperback368 pages
Read More

Read More Reviews

Recommended for You

Hadoop Real-World Solutions Cookbook Book Cover
Hadoop Real-World Solutions Cookbook
$ 29.99
$ 21.00
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Hadoop Beginner's Guide Book Cover
Hadoop Beginner's Guide
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Storm Real-time Processing Cookbook Book Cover
Storm Real-time Processing Cookbook
$ 29.99
$ 21.00