1.0. What's covered in the blog?
1. Documentation on the Oozie SSH action2. Sample oozie workflow application that demonstrates the SSH action - SSH to a specific node, as a specified user, and executes a local shell script that loads a local file to HDFS.
It was tricky getting this action working - and the solution is not something covered in the Apache documentation. Issues and resolution are documented below.
Version:
Oozie 3.3.0
Related blogs:
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action
2.0. Documentation on the Oozie SSH Action
Apache documentation is available at - http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.5_Ssh_Action
Note: The functionality was going to be eventually removed but later decided that it would remain.
So, disregard any mention of deprecation.
3.0. Sample workflow application
3.0.1. Highlights:
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk. The shell script loads a local file to HDFS. If the file load completes successfully, the workflow sends an email to me.
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk. The shell script loads a local file to HDFS. If the file load completes successfully, the workflow sends an email to me.
3.0.2. Pictorial overview:
3.0.3. SSH setup:
1. Passphrase-less SSH for akhanolk from cdh-dev01 (Oozie server) to cdh-dn01 (remote node) and vice versa
2. Passphrase-less SSH for oozie user ID (oozie in my case) on cdh-dev01 to cdh-dn01 as akhanolk
[Running ps -ef | grep oozie on Oozie server will give you the configured Oozie user ID]
3.0.4. Workflow application components:
workflow definition (workflow.xml - in HDFS)
job properties file (job.properties from node submitting job)
Shell script (uploadFile.sh) on remote node (cdh-dn01; At /home/akhanolk/scripts)
Data file (employees_data) on remote node (cdh-dn01; At /home/akhanolk/data)
3.0.5. Desired result:
Upon execution of the workflow, the employees_data on cdh-dn01 should get moved to a specified directory in HDFS
3.0.6. Subsequent sections cover-
- Data and script download
- Oozie job properties file
- Oozie workflow file
- Shell script - uploadFile.sh
- Data load commands
- Oozie SMTP configuration
- SSH setup
- Oozie commands
- Output in HDFS
- Output email
- Oozie web console - screenshots
- Issues encountered and resolution
3.0.7. Data and script download:
3.0.8. Oozie job.properties file:
3.0.9. Oozie workflow.xml:
3.0.10. Shell script (fileUpload.sh):
3.0.11. HDFS load commands:
3.0.12. Oozie SMTP configuration:
3.0.13. Oozie SSH setup:
3.0.14. Oozie commands:
3.0.15. Output in HDFS:
3.0.16. Output email:
3.0.17. Issues encountered:
3.0.18. Oozie web console - screenshots:
Any additional insights are greatly appreciated.
Cheers!