Packt+ | Advance your knowledge in tech

You're reading from Hadoop Real-World Solutions Cookbook - Second Edition

Product type Book

Published in Mar 2016

Publisher

ISBN-13 9781784395506

Pages 290 pages

Edition 2nd Edition

Languages

Concepts

Data Processing

Author (1):

Tanmay Deshpande

Table of Contents (18) Chapters

Hadoop Real-World Solutions Cookbook Second Edition

Credits

About the Author

Acknowledgements

About the Reviewer

www.PacktPub.com

Preface

Getting Started with Hadoop 2.X

Exploring HDFS

Mastering Map Reduce Programs

Data Analysis Using Hive, Pig, and Hbase

Advanced Data Analysis Using Hive

Data Import/Export Using Sqoop and Flume

Automation of Hadoop Tasks Using Oozie

Machine Learning and Predictive Analytics Using Mahout and R

Integration with Apache Spark

Hadoop Use Cases

Index

Chapter 7. Automation of Hadoop Tasks Using Oozie

In this chapter, we'll take a look at the following recipes:

Implementing a Sqoop action job using Oozie
Implementing a Map Reduce action job using Oozie
Implementing a Java action job using Oozie
Implementing a Hive action job using Oozie
Implementing a Pig action job using Oozie
Implementing an e-mail action job using Oozie
Executing parallel jobs using Oozie (fork)
Scheduling a job in Oozie

Introduction

In the previous chapter, we talked about two very important tools, Sqoop and Flume, which help us seamlessly import and export data in and out of Hadoop. Now that we have talked about most of the Hadoop ecosystem tools and their advanced usage, it's time to understand how to automate these tasks using another interesting tool called Oozie. Oozie is a job scheduler, which helps us execute a series of Hadoop tasks in a workflow.

Implementing a Sqoop action job using Oozie

In the previous chapter, we took a look at how to use Sqoop to import and export data from RDBMS to HDFS. In this recipe, you are going to learn how to automate this Sqoop import and export using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Sqoop and Oozie installed on it.

How to do it...

Any Oozie job execution consists of two important things, a workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of an execution. The following is an example of workflow.xml, which uses the Sqoop action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-wf">
<start to="sqoop-node"/>

<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>

<configuration>
<property>
<name>mapred.job.queue.name...

Implementing a Map Reduce action job using Oozie

In the previous recipe, we talked about how to use a Sqoop action to import data to HDFS. In this recipe, we are going to take a look at how to execute Map Reduce jobs using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The Workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the MR action. Here, we also need to provide the jar file that contains the the map reduce code:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user...

Implementing a Java action job using Oozie

In the previous recipe, we talked about how to use Oozie to execute the Map Reduce job. In this recipe, we are going to take a look at how to execute any Java class using Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses a Java action. Here, we need to provide the jar file in which the code is present:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<...

Implementing a Hive action job using Oozie

In this recipe, we are going to take a look at how to use a Hive action in order to automate Hive query executions.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie and Hive installed on it.

How to do it...

Any Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the Hive action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
<start to="hive-node"/>

<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<...

Implementing a Pig action job using Oozie

In this recipe, we are going to take a look at how to use a Pig action in order to automate the Pigscripts executions.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie and Pig installed on it.

How to do it...

A Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the Pig action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="pig-node"/>
<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value...

Implementing an e-mail action job using Oozie

In this recipe, we are going to take a look at how to use an e-mail action in order to notify users about job executions in Oozie.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

An Oozie job execution consists of two important things, workflow.xml and a properties file. The workflow.xml file is where we need to specify the flow of execution. The following is an example of workflow.xml, which uses the e-mail action:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="notify"/>
<action name="notify">
<email xmlns="uri:oozie:email-action:0.1">
<to>a@.b.com</to>
<cc>b@b.com</cc>
<subject>Email notifications for ${wf:id()}</subject>
<body>The wf ${wf:id()} successfully completed.</body>
</email>
<error to="fail"/>
</action>
<kill name="fail...

Executing parallel jobs using Oozie (fork)

In this recipe, we are going to take a look at how to execute parallel jobs using the Oozie fork node. Here, we will be executing one Hive and one Pig job in parallel.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie, Hive, and Pig installed on it.

How to do it...

For parallel execution, we need to use the fork node given by Oozie. The following is a sample workflow that executes Hive and Pig jobs in parallel:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">

<start to="fork-node"/>



<fork name="fork-node">
<path start="pig-node"/>
<path start="hive-node"/>
</fork>

<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
</prepare>
<configuration...

Scheduling a job in Oozie

In this recipe, we are going to take a look at a schedule that has recurring jobs using the Oozie coordinator.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie installed on it.

How to do it...

Oozie provides one more type of job called a coordinator job. This type of job is used to schedule application jobs. With the help of a coordinator job, we can execute an application job. The following is an example of a coordinator job that runs daily:

<coordinator-app name="sample-coordinator"
  frequency="${coord:days(1)}"
  start="2016-01-01T18:56Z" end="2017-01-01T18:56Z" timezone="UTC"
  xmlns="uri:oozie:coordinator:0.2">

<controls>

<concurrency>1</concurrency>
<execution>FIFO</execution>
<throttle>5</throttle>
</controls>


<action>
<workflow>
<app-path>${applicationPath}</app-path>
<configuration>
       ...
</configuration...

The rest of the chapter is locked

You're reading from Hadoop Real-World Solutions Cookbook - Second Edition

Table of Contents (18) Chapters

Chapter 7. Automation of Hadoop Tasks Using Oozie

Introduction

Implementing a Sqoop action job using Oozie

Getting ready

How to do it...

Implementing a Map Reduce action job using Oozie

Getting ready

How to do it...

Implementing a Java action job using Oozie

Getting ready

How to do it

Implementing a Hive action job using Oozie

Getting ready

How to do it...

Implementing a Pig action job using Oozie

Getting ready

How to do it...

Implementing an e-mail action job using Oozie

Getting ready

How to do it...

Executing parallel jobs using Oozie (fork)

Getting ready

How to do it...

Scheduling a job in Oozie

Getting ready

How to do it...

Authors (1)

Personalised recommendations for you

You're reading from Hadoop Real-World Solutions Cookbook - Second Edition

Table of Contents (18) Chapters

Chapter 7. Automation of Hadoop Tasks Using Oozie

Introduction

Implementing a Sqoop action job using Oozie

Getting ready

How to do it...

Implementing a Map Reduce action job using Oozie

Getting ready

How to do it...

Implementing a Java action job using Oozie

Getting ready

How to do it

Implementing a Hive action job using Oozie

Getting ready

How to do it...

Implementing a Pig action job using Oozie

Getting ready

How to do it...

Implementing an e-mail action job using Oozie

Getting ready

How to do it...

Executing parallel jobs using Oozie (fork)

Getting ready

How to do it...

Scheduling a job in Oozie

Getting ready

How to do it...

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you