Programming MapReduce with Scalding

Programming MapReduce with Scalding
eBook: $16.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $44.98    Print cover: $27.99
save 6%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • Develop MapReduce applications using a functional development language in a lightweight, high-performance, and testable way
  • Recognize the Scalding capabilities to communicate with external data stores and perform machine learning operations
  • Full of illustrations and diagrams, practical examples, and tips for deeper understanding of MapReduce application development

Book Details

Language : English
Paperback : 148 pages [ 235mm x 191mm ]
Release Date : June 2014
ISBN : 1783287012
ISBN 13 : 9781783287017
Author(s) : Antonios Chalkiopoulos
Topics and Technologies : All Books, Application Development, Open Source

Table of Contents

Chapter 1: Introduction to MapReduce
Chapter 2: Get Ready for Scalding
Chapter 3: Scalding by Example
Chapter 4: Intermediate Examples
Chapter 5: Scalding Design Patterns
Chapter 6: Testing and TDD
Chapter 7: Running Scalding in Production
Chapter 8: Using External Data Stores
Chapter 9: Matrix Calculations and Machine Learning
  • Chapter 1: Introduction to MapReduce
    • The Hadoop platform
    • MapReduce
      • A MapReduce example
    • MapReduce abstractions
    • Introducing Cascading
      • What happens inside a pipe
      • Pipe assemblies
      • Cascading extensions
    • Summary
  • Chapter 2: Get Ready for Scalding
    • Why Scala?
    • Scala basics
    • Scala build tools
    • Hello World in Scala
    • Development editors
    • Installing Hadoop in five minutes
    • Running our first Scalding job
    • Submitting a Scalding job in Hadoop
    • Summary
  • Chapter 3: Scalding by Example
    • Reading and writing files
      • Best practices to read and write files
      • TextLine parsing
      • Executing in the local and Hadoop modes
    • Understanding the core capabilities of Scalding
      • Map-like operations
      • Join operations
      • Pipe operations
      • Grouping/reducing functions
    • Operations on groups
      • Composite operations
    • A simple example
    • Typed API
    • Summary
  • Chapter 4: Intermediate Examples
    • Logfile analysis
      • Completing the implementation
    • Exploring ad targeting
      • Calculating daily points
      • Calculating historic points
      • Generating targeted ads
    • Summary
  • Chapter 6: Testing and TDD
    • Introduction to testing
    • MapReduce testing challenges
    • Development lifecycle with testing strategy
    • TDD for Scalding developers
      • Implementing the TDD methodology
        • Decomposing the algorithm
        • Defining acceptance tests
        • Implementing integration tests
        • Implementing unit tests
        • Implementing the MapReduce logic
        • Defining and performing system tests
    • Black box testing
    • Summary
  • Chapter 7: Running Scalding in Production
    • Executing Scalding in a Hadoop cluster
    • Scheduling execution
    • Coordinating job execution
    • Configuring using a property file
    • Configuring using Hadoop parameters
    • Monitoring Scalding jobs
    • Using slim JAR files
    • Scalding execution throttling
    • Summary
  • Chapter 8: Using External Data Stores
    • Interacting with external systems
    • SQL databases
    • NoSQL databases
      • Understanding HBase
      • Reading from HBase
      • Writing in HBase
      • Using advanced HBase features
    • Search platforms
      • Elastic search
    • Summary

Antonios Chalkiopoulos

Antonios Chalkiopoulos is a developer living in London and a professional working with Hadoop and Big Data technologies. He completed a number of complex MapReduce applications in Scalding into 40-plus production nodes HDFS Cluster. He is a contributor to Scalding and other open source projects, and he is interested in cloud technologies, NoSQL databases, distributed real-time computation systems, and machine learning.

He was involved in a number of Big Data projects before discovering Scala and Scalding. Most of the content of this book comes from his experience and knowledge accumulated while working with a great team of engineers.

Code Downloads

Download the code and support files for this book.

Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Programming MapReduce with Scalding +    Mastering Web Application Development with AngularJS =
50% Off
the second eBook
Price for both: $31.45

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Set up an environment to execute jobs in local and Hadoop mode
  • Preview the complete Scalding API through examples and illustrations
  • Learn about Scalding capabilities, testing, and pipelining jobs
  • Understand the concepts of MapReduce patterns and the applications of its ecosystem
  • Implement logfile analysis and ad-targeting applications using best practices
  • Apply a test-driven development (TDD) methodology and structure Scalding applications in a modular and testable way
  • Interact with external NoSQL and SQL data stores from Scalding
  • Deploy, schedule, monitor, and maintain production systems

In Detail

Programming MapReduce with Scalding is a practical guide to setting up a development environment and implementing simple and complex MapReduce transformations in Scalding, using a test-driven development methodology and other best practices.

This book will first introduce you to how the Cascading framework allows for higher abstraction reasoning over MapReduce applications and then dive into how Scala DSL Scalding enables us to develop elegant and testable applications. It will then teach you how to test Scalding jobs and how to define  specifications  and behavior-driven development (BDD) with Scalding. This book will also demonstrate how to monitor and maintain cluster stability and efficiently access SQL, NoSQL, and search platforms.

Programming MapReduce with Scalding provides hands-on information starting from proof of concept applications and progressing to production-ready implementations.


This book is an easy-to-understand, practical guide to designing, testing, and implementing complex MapReduce applications in Scala using the Scalding framework. It is packed with examples featuring log-processing, ad-targeting, and machine learning.

Who this book is for

This book is for developers who are willing to discover how to effectively develop MapReduce applications. Prior knowledge of Hadoop or Scala is not required; however, investing some time on those topics would certainly be beneficial.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software