Hadoop Real-World Solutions Cookbook

More Information
  • Data ETL, compression, serialization, and import/export
  • Simple and advanced aggregate analysis
  • Graph analysis
  • Machine learning
  • Troubleshooting and debugging
  • Scalable persistence
  • Cluster administration and configuration

Helping developers become more comfortable and proficient with solving problems in the Hadoop space. People will become more familiar with a wide variety of Hadoop related tools and best practices for implementation.

Hadoop Real-World Solutions Cookbook will teach readers how to build solutions using tools such as Apache Hive, Pig, MapReduce, Mahout, Giraph, HDFS, Accumulo, Redis, and Ganglia.

Hadoop Real-World Solutions Cookbook provides in depth explanations and code examples. Each chapter contains a set of recipes that pose, then solve, technical challenges, and can be completed in any order. A recipe breaks a single problem down into discrete steps that are easy to follow. The book covers (un)loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine learning approaches with Mahout, debugging and troubleshooting MapReduce, and columnar storage and retrieval of structured data using Apache Accumulo.

Hadoop Real-World Solutions Cookbook will give readers the examples they need to apply Hadoop technology to their own problems.

  • Solutions to common problems when working in the Hadoop environment
  • Recipes for (un)loading data, analytics, and troubleshooting
  • In depth code examples demonstrating various analytic models, analytic solutions, and common best practices
Page Count 316
Course Length 9 hours 28 minutes
ISBN 9781849519120
Date Of Publication 7 Feb 2013


Jonathan R. Owens

Jonathan R. Owens has a background in Java and C++, and has worked in both the private and public sectors as a software engineer. Most recently, he has been working with Hadoop, and related distributing processing technologies. Currently, Jonathan R. Owens works for comScore, Inc, a widely regarded digital measurement and analytics company. At comScore, he is a member of the core-processing team, which uses Hadoop and other custom distributed systems to aggregate, analyze, and manage over 40+ billion transactions per day.

Brian Femiano

Brian Femiano has a B.S in Computer Science and has been programming professionally for over 6 years, the last 2 of which have been spent building advanced analytics and big data capabilities using Apache Hadoop. He has worked for the commercial sector in the past, but the majority of his experience comes from the government contracting space. He currently works for Potomac Fusion in the DC/Virginia area, where they develop scalable algorithms to study and enhance some of the most advanced and complex datasets in the government space. Within Potomac Fusion, he has taught courses and training sessions to help teach Apache Hadoop and related cloud-scale technology.

Jon Lentz

Jon Lentz is a software engineer on the Core Processing team at comScore, an online audience measurement and analytics company. He prefers to do most of his coding in Pig. Before working at comScore he wrote software to optimize supply chains and to allocate fixed income securities.