Reader small image

You're reading from  Hadoop 2.x Administration Cookbook

Product typeBook
Published inMay 2017
PublisherPackt
ISBN-139781787126732
Edition1st Edition
Tools
Right arrow
Author (1)
Aman Singh
Aman Singh
author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Right arrow

Configuring MapReduce for performance


In this recipe, we will touch upon MapReduce parameters and see how we can optimize them.

Getting ready

For this recipe, you will again need a running cluster with HDFS and YARN. Users must have completed the recipe Configuring YARN for performance recipe.

How to do it...

  1. Connect to the master node master1.cyrus.com and switch to the hadoop user.

  2. The file where these changes will be made is mapred-site.xml.

  3. The first thing to adjust is to sort the buffer according to the HDFS block size. It must always be greater than the value of dfs.blocksize. This can be configured as follows:

    <property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>200</value>
    </property>
  4. The next value to tune is the number of streams to merge while sorting. This many file handles will be open per mapper:

    <property>
    <name>mapreduce.task.io.sort.factor</name>
    <value>24</value>
    </property>
  5. Another important thing to take...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hadoop 2.x Administration Cookbook
Published in: May 2017Publisher: PacktISBN-13: 9781787126732

Author (1)

author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh