Wednesday, July 10, 2013

Apache Oozie - Part 9c: Coordinator job - dataset availability triggered


1.0. What's covered in the blog?

1) Apache documentation on cooridnator jobs that execute workflows upon availability of datasets
2) A sample program that includes components of a oozie, dataset availability initiated, coordinator job - scripts/code, sample dataset and commands;  Oozie actions covered: hdfs action, email action, sqoop action (mysql database); 

Version:
Oozie 3.3.0;

Related blogs:

Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action 
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered 
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action


2.0. Apache documentation on dataset availability triggered coordinator jobs

http://oozie.apache.org/docs/3.3.0/CoordinatorFunctionalSpec.html

3.0. Sample coordinator application


Highlights:
The coordinator application has a start time, and when the start time condition is met, it will transition to waiting state where it will look for the availability of a dataset.  Once the dataset is available, it will run the workflow specified.


Sample application - pictorial overview:


Coordinator application components:




















Coordinator application details:



Oozie web console output:
Screenshots from the execution of the sample program..


















Upon availability of the dataset...









2 comments:

  1. Hey Anagha,

    Your blogs are extremely well written and make the topics very easy to understand. Appreciate your work a lot. :) Thank you

    ReplyDelete
  2. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete