Reader small image

You're reading from  Optimizing Hadoop for MapReduce

Product typeBook
Published inFeb 2014
Publisher
ISBN-139781783285655
Edition1st Edition
Tools
Right arrow
Author (1)
Khaled Tannir
Khaled Tannir
author image
Khaled Tannir

Khaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and later with Commodore home computer products (Vic 20, Commodore 64, Commodore 128D, and Amiga 500). He has a Bachelor's degree in Electronics, a Master's degree in System Information Architectures, in which he graduated with a professional thesis, and completed his education with a Master of Research degree. He is a Microsoft Certified Solution Developer (MCSD) and has more than 20 years of technical experience leading the development and implementation of software solutions and giving technical presentations. He now works as an independent IT consultant and has worked as an infrastructure engineer, senior developer, and enterprise/solution architect for many companies in France and Canada. With significant experience in Microsoft .Net, Microsoft Server Systems, and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions, and multilingual applications in both domains: Internet and Desktops. He is always researching new technologies, learning about them, and looking for new adventures in France, North America, and the Middle-east. He owns an IT and electronics laboratory with many servers, monitors, open electronic boards such as Arduino, Netduino, RaspBerry Pi, and .Net Gadgeteer, and some smartphone devices based on Windows Phone, Android, and iOS operating systems. In 2012, he contributed to the EGC 2012 (International Complex Data Mining forum at Bordeaux University, France) and presented, in a workshop session, his work on "how to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize the use of data mining algorithms such as k-means and Apriori in a cloud computing environment. He is the author of RavenDB 2.x Beginner's Guide, Packt Publishing. He aims to get a PhD in Cloud Computing and Big Data and wants to learn more and more about these technologies. He enjoys taking landscape and night time photos, travelling, playing video games, creating funny electronic gadgets with Arduino/.Net Gadgeteer, and of course, spending time with his wife and family. You can reach him at contact@khaledtannir.net.
Read more about Khaled Tannir

Right arrow

Identifying cluster weakness


Adapting the Hadoop framework's configuration based on a cluster's hardware and number of nodes has proven to give increased performance. In order to ensure that your Hadoop framework is using your hardware efficiently and you have defined the number of mappers and reducers correctly, you need to check your environment to identify whether there are nodes, CPU, or network weaknesses. Then you can decide whether the Hadoop framework should behave as a new set of configuration, or needs to be optimized.

In the following sections, we will go through common scenarios that may cause your job to perform poorly. Each scenario has its own technique that shows how to identify the problem. The scenario covers the cluster node's health, the input data size, massive I/O and network traffic, insufficient concurrent tasks, and CPU contention (which occurs when all lower priority tasks have to wait when any higher priority CPU-bound task is running, and there are no other CPUs...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Optimizing Hadoop for MapReduce
Published in: Feb 2014Publisher: ISBN-13: 9781783285655

Author (1)

author image
Khaled Tannir

Khaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and later with Commodore home computer products (Vic 20, Commodore 64, Commodore 128D, and Amiga 500). He has a Bachelor's degree in Electronics, a Master's degree in System Information Architectures, in which he graduated with a professional thesis, and completed his education with a Master of Research degree. He is a Microsoft Certified Solution Developer (MCSD) and has more than 20 years of technical experience leading the development and implementation of software solutions and giving technical presentations. He now works as an independent IT consultant and has worked as an infrastructure engineer, senior developer, and enterprise/solution architect for many companies in France and Canada. With significant experience in Microsoft .Net, Microsoft Server Systems, and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions, and multilingual applications in both domains: Internet and Desktops. He is always researching new technologies, learning about them, and looking for new adventures in France, North America, and the Middle-east. He owns an IT and electronics laboratory with many servers, monitors, open electronic boards such as Arduino, Netduino, RaspBerry Pi, and .Net Gadgeteer, and some smartphone devices based on Windows Phone, Android, and iOS operating systems. In 2012, he contributed to the EGC 2012 (International Complex Data Mining forum at Bordeaux University, France) and presented, in a workshop session, his work on "how to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize the use of data mining algorithms such as k-means and Apriori in a cloud computing environment. He is the author of RavenDB 2.x Beginner's Guide, Packt Publishing. He aims to get a PhD in Cloud Computing and Big Data and wants to learn more and more about these technologies. He enjoys taking landscape and night time photos, travelling, playing video games, creating funny electronic gadgets with Arduino/.Net Gadgeteer, and of course, spending time with his wife and family. You can reach him at contact@khaledtannir.net.
Read more about Khaled Tannir