1.0. What's covered in the blog?
1) Apache documentation on cooridnator jobs that execute workflows upon availability of datasets
2) A sample program that includes components of a oozie, dataset availability initiated, coordinator job - scripts/code, sample dataset and commands; Oozie actions covered: hdfs action, email action, sqoop action (mysql database);
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action
2.0. Apache documentation on dataset availability triggered coordinator jobs
3.0. Sample coordinator application
The coordinator application has a start time, and when the start time condition is met, it will transition to waiting state where it will look for the availability of a dataset. Once the dataset is available, it will run the workflow specified.
Sample application - pictorial overview:Coordinator application components:
Coordinator application details:
Oozie web console output:
Screenshots from the execution of the sample program..
Upon availability of the dataset...