Hadoop MapReduce Cookbook

More Information
Learn
  • How to install Hadoop MapReduce and HDFS to begin running examples
  • How to configure and administer Hadoop and HDFS securely
  • Understanding the internals of Hadoop and how Hadoop can be extended to suit your needs
  • How to use HBase, Hive, Pig, Mahout, and Nutch to get things done easily and efficiently
  • How to use MapReduce to solve many types of analytics problems
  • Solve complex problems such as classifications, finding relationships, online marketing, and recommendations
  • Using MapReduce for massive text data processing
  • How to use cloud environments to perform Hadoop computations
About

We are facing an avalanche of data. The unstructured data we gather can contain many insights that might hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop MapReduce is one of the most highly sought after skills in today's job market.

"Hadoop MapReduce Cookbook" is a one-stop guide to processing large and complex data sets using the Hadoop ecosystem. The book introduces you to simple examples and then dives deep to solve in-depth big data use cases.

"Hadoop MapReduce Cookbook" presents more than 50 ready-to-use Hadoop MapReduce recipes in a simple and straightforward manner, with step-by-step instructions and real world examples.

Start with how to install, then configure, extend, and administer Hadoop. Then write simple examples, learn MapReduce patterns, harness the Hadoop landscape, and finally jump to the cloud.

The book deals with many exciting topics such as setting up Hadoop security, using MapReduce to solve analytics, classifications, on-line marketing, recommendations, and searching use cases. You will learn how to harness components from the Hadoop ecosystem including HBase, Hadoop, Pig, and Mahout, then learn how to set up cloud environments to perform Hadoop MapReduce computations.

"Hadoop MapReduce Cookbook" teaches you how process large and complex data sets using real examples providing a comprehensive guide to get things done using Hadoop MapReduce.

Features
  • Learn to process large and complex data sets, starting simply, then diving in deep
  • Solve complex big data problems such as classifications, finding relationships, online marketing and recommendations
  • More than 50 Hadoop MapReduce recipes, presented in a simple and straightforward manner, with step-by-step instructions and real world examples
Page Count 300
Course Length 9 hours 0 minutes
ISBN 9781849517287
Date Of Publication 25 Jan 2013
Introduction
Choosing appropriate Hadoop data types
Implementing a custom Hadoop Writable data type
Implementing a custom Hadoop key type
Emitting data of different value types from a mapper
Choosing a suitable Hadoop InputFormat for your input data format
Adding support for new input data formats – implementing a custom InputFormat
Formatting the results of MapReduce computations – using Hadoop OutputFormats
Hadoop intermediate (map to reduce) data partitioning
Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache
Using Hadoop with legacy applications – Hadoop Streaming
Adding dependencies between MapReduce jobs
Hadoop counters for reporting custom metrics

Authors

Srinath Perera

Srinath Perera is a senior software architect at WSO2 Inc., where he overlooks the overall WSO2 platform architecture with the CTO. He also serves as a research scientist at Lanka Software Foundation and teaches as a visiting faculty at Department of Computer Science and Engineering, University of Moratuwa. He is a co-founder of Apache Axis2 open source project, and he has been involved with the Apache Web Service project since 2002 and is a member of Apache Software foundation and Apache Web Service project PMC. He is also a committer of Apache open source projects Axis, Axis2, and Geronimo. He received his Ph.D. and M.Sc. in Computer Sciences from Indiana University, Bloomington, USA and received his Bachelor of Science in Computer Science and Engineering degree from the University of Moratuwa, Sri Lanka. He has authored many technical and peer reviewed research articles, and more details can be found on his website. He is also a frequent speaker at technical venues. He has worked with large-scale distributed systems for a long time. He closely works with Big Data technologies like Hadoop and Cassandra daily. He also teaches a parallel programming graduate class at University of Moratuwa, which is primarily based on Hadoop.

Thilina Gunarathne

Thilina Gunarathne is a senior data scientist at KPMG LLP. He led the Hadoop-related efforts at Link Analytics before its acquisition by KPMG LLP. He has extensive experience in using Apache Hadoop and its related technologies for large-scale data-intensive computations. He coauthored the first edition of this book, Hadoop MapReduce Cookbook, with Dr. Srinath Perera.

Thilina has contributed to several open source projects at Apache Software Foundation as a member, committer, and a PMC member. He has also published many peer-reviewed research articles on how to extend the MapReduce model to perform efficient data mining and data analytics computations in the cloud. Thilina received his PhD and MSc degrees in computer science from Indiana University, Bloomington, USA, and received his bachelor of science degree in computer science and engineering from University of Moratuwa, Sri Lanka.