1.0. What's covered in the blog?A sample application that includes components of a Oozie (trigger) file triggered coordinator job - scripts/code, sample data (Syslog generated log files) and commands; Oozie actions covered: hdfs action, email action, java main action, hive action; Oozie controls covered: decision, fork-join; The workflow includes a sub-workflow that runs two hive actions concurrently. The hive table is partitioned; Parsing - hive-regex, and Java-regex. Also, the java mapper, gets the input directory path and includes part of it in the key.
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action
If you want to share your thoughts/updates, email me at firstname.lastname@example.org.
2.0. Sample coordinator application
The coordinator application starts executing upon availability of the trigger file defined and initiates the two workflows. Both workflows generate reports off of data in hdfs.
The java main action parses log files and generates a report.
The hive actions in the hive sub-workflow run reports off of hive tables against the same log files in hdfs.
Pictorial overview of coordinator application:
Coordinator application details:
Oozie web console - screenshots: