Configuring the values of the following configuration entries according to the hardware/software configurations of the Hadoop cluster helps to use the available resources, such as CPU and memory, optimally.
The important configurations in the mapred-site.xml
file are given as follows:
Set the maximum tasks that can be executed in the map phase and the reduce phase:
mapreduce.tasktracker.map.tasks.maximum mapreduce.tasktracker.reduce.tasks.maximum
Set the number of map and reduce tasks according to number of cores available:
mapreduce.job.reduces mapreduce.job.maps
The important configurations in the hdfs-site.xml
file are given as follows:
Set the block size for the files according to the storage requirements of your problem:
dfs.blocksize
However, discussing the performance-tuning approaches for Hadoop in detail is beyond the scope of this book.