Hooked on Hadoop: Apache Oozie - Part 9c: Coordinator job

Wednesday, July 10, 2013

Apache Oozie - Part 9c: Coordinator job - dataset availability triggered

1.0. What's covered in the blog?

1) Apache documentation on cooridnator jobs that execute workflows upon availability of datasets

2) A sample program that includes components of a oozie, dataset availability initiated, coordinator job - scripts/code, sample dataset and commands; Oozie actions covered: hdfs action, email action, sqoop action (mysql database);

Version:

Oozie 3.3.0;

Related blogs:

Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action

2.0. Apache documentation on dataset availability triggered coordinator jobs

http://oozie.apache.org/docs/3.3.0/CoordinatorFunctionalSpec.html

3.0. Sample coordinator application

Highlights:
The coordinator application has a start time, and when the start time condition is met, it will transition to waiting state where it will look for the availability of a dataset. Once the dataset is available, it will run the workflow specified.

Sample application - pictorial overview:

Coordinator application components:

Coordinator application details:

Oozie web console output:

Screenshots from the execution of the sample program..

Upon availability of the dataset...

3 comments:

jayesh291May 20, 2014 at 12:44 AM
Hey Anagha,

Your blogs are extremely well written and make the topics very easy to understand. Appreciate your work a lot. :) Thank you
ReplyDelete
Replies
UnknownJuly 8, 2018 at 5:16 AM
thakyou it vry nice blog for beginners
https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
ReplyDelete
Replies
RenuAugust 17, 2018 at 2:57 AM
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

ReplyDelete
Replies

Add comment

Hooked on Hadoop

Wednesday, July 10, 2013

Apache Oozie - Part 9c: Coordinator job - dataset availability triggered

1.0. What's covered in the blog?

3.0. Sample coordinator application

3 comments:

Search

Blog archive

Popular Posts

Total Pageviews