Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hadoop Real-World Solutions Cookbook - Second Edition

You're reading from  Hadoop Real-World Solutions Cookbook - Second Edition

Product type Book
Published in Mar 2016
Publisher
ISBN-13 9781784395506
Pages 290 pages
Edition 2nd Edition
Languages
Author (1):
Tanmay Deshpande Tanmay Deshpande
Profile icon Tanmay Deshpande

Table of Contents (18) Chapters

Hadoop Real-World Solutions Cookbook Second Edition
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Getting Started with Hadoop 2.X Exploring HDFS Mastering Map Reduce Programs Data Analysis Using Hive, Pig, and Hbase Advanced Data Analysis Using Hive Data Import/Export Using Sqoop and Flume Automation of Hadoop Tasks Using Oozie Machine Learning and Predictive Analytics Using Mahout and R Integration with Apache Spark Hadoop Use Cases Index

Chapter 7. Automation of Hadoop Tasks Using Oozie

In this chapter, we'll take a look at the following recipes:

  • Implementing a Sqoop action job using Oozie

  • Implementing a Map Reduce action job using Oozie

  • Implementing a Java action job using Oozie

  • Implementing a Hive action job using Oozie

  • Implementing a Pig action job using Oozie

  • Implementing an e-mail action job using Oozie

  • Executing parallel jobs using Oozie (fork)

  • Scheduling a job in Oozie

Introduction


In the previous chapter, we talked about two very important tools, Sqoop and Flume, which help us seamlessly import and export data in and out of Hadoop. Now that we have talked about most of the Hadoop ecosystem tools and their advanced usage, it's time to understand how to automate these tasks using another interesting tool called Oozie. Oozie is a job scheduler, which helps us execute a series of Hadoop tasks in a workflow.

Implementing a Sqoop action job using Oozie


In the previous chapter, we took a look at how to use Sqoop to import and export data from RDBMS to HDFS. In this recipe, you are going to learn how to automate this Sqoop import and export using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Sqoop and Oozie installed on it.

How to do it...

Any Oozie job execution consists of two important things, a workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of an execution. The following is an example of workflow.xml, which uses the Sqoop action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-wf">
<start to="sqoop-node"/>

<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>

<configuration>
<property>
<name>mapred.job.queue.name...

Implementing a Map Reduce action job using Oozie


In the previous recipe, we talked about how to use a Sqoop action to import data to HDFS. In this recipe, we are going to take a look at how to execute Map Reduce jobs using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The Workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the MR action. Here, we also need to provide the jar file that contains the the map reduce code:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user...

Implementing a Java action job using Oozie


In the previous recipe, we talked about how to use Oozie to execute the Map Reduce job. In this recipe, we are going to take a look at how to execute any Java class using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses a Java action. Here, we need to provide the jar file in which the code is present:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<...

Implementing a Hive action job using Oozie


In this recipe, we are going to take a look at how to use a Hive action in order to automate Hive query executions.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie and Hive installed on it.

How to do it...

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the Hive action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
<start to="hive-node"/>

<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<...

Implementing a Pig action job using Oozie


In this recipe, we are going to take a look at how to use a Pig action in order to automate the Pigscripts executions.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie and Pig installed on it.

How to do it...

A Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the Pig action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="pig-node"/>
<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value...

Implementing an e-mail action job using Oozie


In this recipe, we are going to take a look at how to use an e-mail action in order to notify users about job executions in Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

An Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the e-mail action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="notify"/>
<action name="notify">
<email xmlns="uri:oozie:email-action:0.1">
<to>a@.b.com</to>
<cc>b@b.com</cc>
<subject>Email notifications for ${wf:id()}</subject>
<body>The wf ${wf:id()} successfully completed.</body>
</email>
<error to="fail"/>
</action>
<kill name="fail...

Executing parallel jobs using Oozie (fork)


In this recipe, we are going to take a look at how to execute parallel jobs using the Oozie fork node. Here, we will be executing one Hive and one Pig job in parallel.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie, Hive, and Pig installed on it.

How to do it...

For parallel execution, we need to use the fork node given by Oozie. The following is a sample workflow that executes Hive and Pig jobs in parallel:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">

<start to="fork-node"/>



<fork name="fork-node">
<path start="pig-node"/>
<path start="hive-node"/>
</fork>

<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
</prepare>
<configuration...

Scheduling a job in Oozie


In this recipe, we are going to take a look at a schedule that has recurring jobs using the Oozie coordinator.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

Oozie provides one more type of job called a coordinator job. This type of job is used to schedule application jobs. With the help of a coordinator job, we can execute an application job. The following is an example of a coordinator job that runs daily:

<coordinator-app name="sample-coordinator"
  frequency="${coord:days(1)}"
  start="2016-01-01T18:56Z" end="2017-01-01T18:56Z" timezone="UTC"
  xmlns="uri:oozie:coordinator:0.2">

<controls>

<concurrency>1</concurrency>
<execution>FIFO</execution>
<throttle>5</throttle>
</controls>


<action>
<workflow>
<app-path>${applicationPath}</app-path>
<configuration>
       ...
</configuration...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Hadoop Real-World Solutions Cookbook - Second Edition
Published in: Mar 2016 Publisher: ISBN-13: 9781784395506
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}