Programming MapReduce with Scalding

A practical guide to designing, testing, and implementing complex MapReduce applications in Scala

Programming MapReduce with Scalding

Starting
Antonios Chalkiopoulos

A practical guide to designing, testing, and implementing complex MapReduce applications in Scala
$16.99
$27.99
RRP $16.99
RRP $27.99
eBook
Print + eBook
$12.99 p/month

Want this title & more? Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.
+ Collection
Free Sample

Book Details

ISBN 139781783287017
Paperback148 pages

About This Book

  • Develop MapReduce applications using a functional development language in a lightweight, high-performance, and testable way
  • Recognize the Scalding capabilities to communicate with external data stores and perform machine learning operations
  • Full of illustrations and diagrams, practical examples, and tips for deeper understanding of MapReduce application development

Who This Book Is For

This book is for developers who are willing to discover how to effectively develop MapReduce applications. Prior knowledge of Hadoop or Scala is not required; however, investing some time on those topics would certainly be beneficial.

Table of Contents

Chapter 1: Introduction to MapReduce
The Hadoop platform
MapReduce
MapReduce abstractions
Introducing Cascading
Summary
Chapter 2: Get Ready for Scalding
Why Scala?
Scala basics
Scala build tools
Hello World in Scala
Development editors
Installing Hadoop in five minutes
Running our first Scalding job
Submitting a Scalding job in Hadoop
Summary
Chapter 3: Scalding by Example
Reading and writing files
Understanding the core capabilities of Scalding
Operations on groups
A simple example
Typed API
Summary
Chapter 4: Intermediate Examples
Logfile analysis
Exploring ad targeting
Summary
Chapter 5: Scalding Design Patterns
The external operations pattern
The dependency injection pattern
The late bound dependency pattern
Summary
Chapter 6: Testing and TDD
Introduction to testing
MapReduce testing challenges
Development lifecycle with testing strategy
TDD for Scalding developers
Black box testing
Summary
Chapter 7: Running Scalding in Production
Executing Scalding in a Hadoop cluster
Scheduling execution
Coordinating job execution
Configuring using a property file
Configuring using Hadoop parameters
Monitoring Scalding jobs
Using slim JAR files
Scalding execution throttling
Summary
Chapter 8: Using External Data Stores
Interacting with external systems
SQL databases
NoSQL databases
Search platforms
Summary
Chapter 9: Matrix Calculations and Machine Learning
Text similarity using TF-IDF
Setting a similarity using the Jaccard index
K-Means using Mahout
Other libraries
Summary

What You Will Learn

  • Set up an environment to execute jobs in local and Hadoop mode
  • Preview the complete Scalding API through examples and illustrations
  • Learn about Scalding capabilities, testing, and pipelining jobs
  • Understand the concepts of MapReduce patterns and the applications of its ecosystem
  • Implement logfile analysis and ad-targeting applications using best practices
  • Apply a test-driven development (TDD) methodology and structure Scalding applications in a modular and testable way
  • Interact with external NoSQL and SQL data stores from Scalding
  • Deploy, schedule, monitor, and maintain production systems

In Detail

Programming MapReduce with Scalding is a practical guide to setting up a development environment and implementing simple and complex MapReduce transformations in Scalding, using a test-driven development methodology and other best practices.

This book will first introduce you to how the Cascading framework allows for higher abstraction reasoning over MapReduce applications and then dive into how Scala DSL Scalding enables us to develop elegant and testable applications. It will then teach you how to test Scalding jobs and how to define  specifications  and behavior-driven development (BDD) with Scalding. This book will also demonstrate how to monitor and maintain cluster stability and efficiently access SQL, NoSQL, and search platforms.

Programming MapReduce with Scalding provides hands-on information starting from proof of concept applications and progressing to production-ready implementations.

Authors

Table of Contents

Chapter 1: Introduction to MapReduce
The Hadoop platform
MapReduce
MapReduce abstractions
Introducing Cascading
Summary
Chapter 2: Get Ready for Scalding
Why Scala?
Scala basics
Scala build tools
Hello World in Scala
Development editors
Installing Hadoop in five minutes
Running our first Scalding job
Submitting a Scalding job in Hadoop
Summary
Chapter 3: Scalding by Example
Reading and writing files
Understanding the core capabilities of Scalding
Operations on groups
A simple example
Typed API
Summary
Chapter 4: Intermediate Examples
Logfile analysis
Exploring ad targeting
Summary
Chapter 5: Scalding Design Patterns
The external operations pattern
The dependency injection pattern
The late bound dependency pattern
Summary
Chapter 6: Testing and TDD
Introduction to testing
MapReduce testing challenges
Development lifecycle with testing strategy
TDD for Scalding developers
Black box testing
Summary
Chapter 7: Running Scalding in Production
Executing Scalding in a Hadoop cluster
Scheduling execution
Coordinating job execution
Configuring using a property file
Configuring using Hadoop parameters
Monitoring Scalding jobs
Using slim JAR files
Scalding execution throttling
Summary
Chapter 8: Using External Data Stores
Interacting with external systems
SQL databases
NoSQL databases
Search platforms
Summary
Chapter 9: Matrix Calculations and Machine Learning
Text similarity using TF-IDF
Setting a similarity using the Jaccard index
K-Means using Mahout
Other libraries
Summary

Book Details

ISBN 139781783287017
Paperback148 pages
Read More