Reader small image

You're reading from  Apache Oozie Essentials

Product typeBook
Published inDec 2015
Reading LevelIntermediate
Publisher
ISBN-139781785880384
Edition1st Edition
Languages
Right arrow
Author (1)
Jagat Singh
Jagat Singh
author image
Jagat Singh

Contacted on 12/01/18 by Davis Anto
Read more about Jagat Singh

Right arrow

Running a MapReduce streaming job


In this section we will learn how to run Hadoop Streaming jobs using Oozie. Hadoop Streaming gives the functionality to use different languages such as Python, C++, and Ruby to write MapReduce code.

Note

Read the Oozie documentation at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2_Map-Reduce_Action and write a Workflow to run a Streaming job. Schedule the same using Coordinator. You can refer to the sample Python mapper and reducer code available at http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.

Save the Python code from the preceding web links as mapper.py and reducer.py in the streaming folder.

The <mapper> tag makes our mapper and reducer file available to Oozie.

The Workflow looks like this:

<workflow-app name="Mapreduce_Streaming_example" xmlns="uri:oozie:workflow:0.5">
  <start to="streaming-c097"/>
    <kill name="Kill">
      <message>Action failed, error message...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Oozie Essentials
Published in: Dec 2015Publisher: ISBN-13: 9781785880384

Author (1)

author image
Jagat Singh

Contacted on 12/01/18 by Davis Anto
Read more about Jagat Singh