Learning Hadoop 2

Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2

Learning Hadoop 2

Garry Turkington, Gabriele Modena

Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2
Packt Subscription
$5.00
$9.99/m after first month
eBook
$5.00
RRP $29.99
Save 83%
Print + eBook
$49.99
RRP $49.99
What do I get with a Packt subscription?
  • Exclusive monthly discount - no contract
  • Unlimited access to entire Packt library of 6500+ eBooks and Videos
  • 120 new titles added every month, on new and emerging tech
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
$5.00
$5.00
$49.99
$9.99/m after first month
RRP $29.99
RRP $49.99
Subscription
eBook
Print + eBook
Subscribe Now

Frequently bought together


Learning Hadoop 2 Book Cover
Learning Hadoop 2
$ 29.99
$ 5.00
Hadoop 2.x Administration Cookbook Book Cover
Hadoop 2.x Administration Cookbook
$ 39.99
$ 5.00
Buy 2 for $10.00
Save $59.98
Add to Cart

Book Details

ISBN 139781783285518
Paperback382 pages

Book Description

This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop 2. Starting with the core components of the framework—HDFS and YARN—this book will guide you through how to build applications using a variety of approaches.

You will learn how YARN completely changes the relationship between MapReduce and Hadoop and allows the latter to support more varied processing approaches and a broader array of applications. These include real-time processing with Apache Samza and iterative computation with Apache Spark. Next up, we discuss Apache Pig and the dataflow data model it provides. You will discover how to use Pig to analyze a Twitter dataset.

With this book, you will be able to make your life easier by using tools such as Apache Hive, Apache Oozie, Hadoop Streaming, Apache Crunch, and Kite SDK. The last part of this book discusses the likely future direction of major Hadoop components and how to get involved with the Hadoop community.

Table of Contents

Chapter 4: Real-time Computation with Samza

What You Will Learn

  • Write distributed applications using the MapReduce framework
  • Go beyond MapReduce and process data in real time with Samza and iteratively with Spark
  • Familiarize yourself with data mining approaches that work with very large datasets
  • Prototype applications on a VM and deploy them to a local cluster or to a cloud infrastructure (Amazon Web Services)
  • Conduct batch and real time data analysis using SQL-like tools
  • Build data processing flows using Apache Pig and see how it enables the easy incorporation of custom functionality
  • Define and orchestrate complex workflows and pipelines with Apache Oozie
  • Manage your data lifecycle and changes over time

Authors

Table of Contents

Chapter 4: Real-time Computation with Samza

Book Details

ISBN 139781783285518
Paperback382 pages
Read More

Read More Reviews

Recommended for You

Hadoop 2.x Administration Cookbook Book Cover
Hadoop 2.x Administration Cookbook
$ 39.99
$ 5.00
Hadoop: Data Processing and Modelling Book Cover
Hadoop: Data Processing and Modelling
$ 79.99
$ 5.00
Hadoop Essentials Book Cover
Hadoop Essentials
$ 23.99
$ 5.00
Hadoop Essentials Book Cover
Hadoop Essentials
$ 23.99
$ 5.00
Hadoop Essentials Book Cover
Hadoop Essentials
$ 23.99
$ 5.00
Python Machine Learning - Second Edition Book Cover
Python Machine Learning - Second Edition
$ 31.99
$ 5.00