1.0. What's covered in the blog?
1. Documentation on the Oozie SSH action2. Sample oozie workflow application that demonstrates the SSH action - SSH to a specific node, as a specified user, and executes a local shell script that loads a local file to HDFS.
It was tricky getting this action working - and the solution is not something covered in the Apache documentation. Issues and resolution are documented below.
Version:
Oozie 3.3.0
Related blogs:
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action
2.0. Documentation on the Oozie SSH Action
Apache documentation is available at - http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.5_Ssh_Action
Note: The functionality was going to be eventually removed but later decided that it would remain.
So, disregard any mention of deprecation.
3.0. Sample workflow application
3.0.1. Highlights:
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk. The shell script loads a local file to HDFS. If the file load completes successfully, the workflow sends an email to me.
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk. The shell script loads a local file to HDFS. If the file load completes successfully, the workflow sends an email to me.
3.0.2. Pictorial overview:
3.0.3. SSH setup:
1. Passphrase-less SSH for akhanolk from cdh-dev01 (Oozie server) to cdh-dn01 (remote node) and vice versa
2. Passphrase-less SSH for oozie user ID (oozie in my case) on cdh-dev01 to cdh-dn01 as akhanolk
[Running ps -ef | grep oozie on Oozie server will give you the configured Oozie user ID]
3.0.4. Workflow application components:
workflow definition (workflow.xml - in HDFS)
job properties file (job.properties from node submitting job)
Shell script (uploadFile.sh) on remote node (cdh-dn01; At /home/akhanolk/scripts)
Data file (employees_data) on remote node (cdh-dn01; At /home/akhanolk/data)
3.0.5. Desired result:
Upon execution of the workflow, the employees_data on cdh-dn01 should get moved to a specified directory in HDFS
3.0.6. Subsequent sections cover-
- Data and script download
- Oozie job properties file
- Oozie workflow file
- Shell script - uploadFile.sh
- Data load commands
- Oozie SMTP configuration
- SSH setup
- Oozie commands
- Output in HDFS
- Output email
- Oozie web console - screenshots
- Issues encountered and resolution
3.0.7. Data and script download:
3.0.8. Oozie job.properties file:
3.0.9. Oozie workflow.xml:
3.0.10. Shell script (fileUpload.sh):
3.0.11. HDFS load commands:
3.0.12. Oozie SMTP configuration:
3.0.13. Oozie SSH setup:
3.0.14. Oozie commands:
3.0.15. Output in HDFS:
3.0.16. Output email:
3.0.17. Issues encountered:
3.0.18. Oozie web console - screenshots:
Any additional insights are greatly appreciated.
Cheers!
I've upgraded my cluster from CDH 4.7 to 5.0 which have Oozie 4.0.0 and now it seems that the capture-output doesn't work for the SSH action. That thing is driving me crazy, there is no way to read the output of a SSH Action. Anyone have a solution?
ReplyDeleteI've opened a topic on stackoverflow about it: http://stackoverflow.com/questions/24492574/capture-output-is-empty-for-ssh-actions
Tried several things already but always ending up with the same error. The workflow errors out with:
ReplyDelete2016-06-08 16:24:23,890 WARN ActionStartXCommand:523 - SERVER[hadwkr3-dg.ie.tslabs.hpecorp.net] USER[hdfs] GROUP[-] TOKEN[] APP[ssh-wf-test2] JOB[0000974-160503085957938-oozie-oozi-W] ACTION[0000974-160503085957938-oozie-oozi-W@ssh] Error starting action [ssh]. ErrorType [TRANSIENT], ErrorCode [FNF], Message [FNF: Required Local file /var/tmp/oozie/oozie-oozi1017005846702963868.dir/ssh/ssh-base.sh not present.]
org.apache.oozie.action.ActionExecutorException: FNF: Required Local file /var/tmp/oozie/oozie-oozi1017005846702963868.dir/ssh/ssh-base.sh not present.
at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:572)
at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:206)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Required Local file /var/tmp/oozie/oozie-oozi1017005846702963868.dir/ssh/ssh-base.sh not present.
at org.apache.oozie.action.ssh.SshActionExecutor.setupRemote(SshActionExecutor.java:367)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:208)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:206)
at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:550)
... 10 more
Have you encountered this or have a clue on how to workaround it? Using oozie 4.2.0
Hi indy .. are you able to resolve that error. we are also getting the same error. Could you please let us know the info if it is resolved. Here is the error "FNF: Required Local file /var/tmp/oozie/oozie-oozi7179698092277115649.dir/ssh/ssh-base.sh not present."
ReplyDeleteHi Amara,Indy
ReplyDeletei am also facing same issue...it could be better if you get any solution for that
Hi Amara,Indy
ReplyDeletei am also facing same issue...it could be better if you get any solution for that
Issue: FNF: Required Local file /var/tmp/oozie/oozie-oozi7179698092277115649.dir/ssh/ssh-base.sh not present.
ReplyDeleteThis issue is solved by restarting oozie server. Some one might have removed that folder and files from master server. Once you restarted oozie server, oozie will create this folder directory structure and will copy ssh-wrapper.sh and ssh-base.sh files. Then it started processing jobs. This folder structure will be created only in oozie server machine.
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteBig Data Hadoop training in electronic city