Free Sample
+ Collection
Code Files

Optimizing Hadoop for MapReduce

Progressing
Khaled Tannir

This book is the perfect introduction to sophisticated concepts in MapReduce and will ensure you have the knowledge to optimize job performance. This is not an academic treatise; it’s an example-driven tutorial for the real world.
$20.99
$34.99
RRP $20.99
RRP $34.99
eBook
Print + eBook

Want this title & more?

$21.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783285655
Paperback120 pages

About This Book

  • Optimize your MapReduce job performance
  • Identify your Hadoop cluster’s weaknesses
  • Tune your MapReduce configuration

Who This Book Is For

If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

Table of Contents

Chapter 1: Understanding Hadoop MapReduce
The MapReduce model
An overview of Hadoop MapReduce
Hadoop MapReduce internals
Factors affecting the performance of MapReduce
Summary
Chapter 2: An Overview of the Hadoop Parameters
Investigating the Hadoop parameters
Hadoop MapReduce metrics
Performance monitoring tools
Summary
Chapter 3: Detecting System Bottlenecks
Performance tuning
Creating a performance baseline
Identifying resource bottlenecks
Summary
Chapter 4: Identifying Resource Weaknesses
Identifying cluster weakness
Sizing your Hadoop cluster
Configuring your cluster correctly
Summary
Chapter 5: Enhancing Map and Reduce Tasks
Enhancing map tasks
Enhancing reduce tasks
Tuning map and reduce parameters
Summary
Chapter 6: Optimizing MapReduce Tasks
Using Combiners
Using compression
Using appropriate Writable types
Reusing types smartly
Optimizing mappers and reducers code
Summary
Chapter 7: Best Practices and Recommendations
Hardware tuning and OS recommendations
Hadoop best practices and recommendations
Summary

What You Will Learn

  • Learn about the factors that affect MapReduce performance
  • Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks
  • Size your Hadoop cluster’s nodes
  • Set the number of mappers and reducers correctly
  • Optimize mapper and reducer task throughput and code size using compression and Combiners
  • Understand the various tuning properties and best practices to optimize clusters

In Detail

MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.

This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster’s node resources to run MapReduce jobs optimally.

This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster’s node resources.

Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.

The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.

Authors

Read More