Wednesday, October 30, 2013

Apache Oozie - Part 13: Oozie SSH Action


1.0. What's covered in the blog?

1. Documentation on the Oozie SSH action
2. Sample oozie workflow application that demonstrates the SSH action - SSH to a specific node, as a specified user, and executes a local shell script that loads a local file to HDFS.

It was tricky getting this action working - and the solution is not something covered in the Apache documentation.  Issues and resolution are documented below.  

Version:
Oozie 3.3.0

Related blogs:
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action 
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered 
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action


2.0. Documentation on the Oozie SSH Action


Apache documentation is available at - http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.5_Ssh_Action

Note: The functionality was going to be eventually removed but later decided that it would remain.
So, disregard any mention of deprecation.


3.0. Sample workflow application


3.0.1. Highlights:
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk.  The shell script loads a local file to HDFS.  If the file load completes successfully, the workflow sends an email to me.

3.0.2. Pictorial overview:













3.0.3. SSH setup:
1. Passphrase-less SSH for akhanolk from cdh-dev01 (Oozie server) to cdh-dn01 (remote node) and vice versa
2.  Passphrase-less SSH for oozie user ID (oozie in my case) on cdh-dev01 to cdh-dn01 as akhanolk
[Running ps -ef | grep oozie on Oozie server will give you the configured Oozie user ID]

3.0.4. Workflow application components:
workflow definition (workflow.xml - in HDFS)
job properties file (job.properties from node submitting job)
Shell script (uploadFile.sh) on remote node (cdh-dn01; At /home/akhanolk/scripts)
Data file (employees_data) on remote node (cdh-dn01; At /home/akhanolk/data)

3.0.5. Desired result:
Upon execution of the workflow, the employees_data on cdh-dn01 should get moved to a specified directory in HDFS

3.0.6. Subsequent sections cover-

  1. Data and script download
  2. Oozie job properties file        
  3. Oozie workflow  file
  4. Shell script - uploadFile.sh
  5. Data load commands                
  6. Oozie SMTP configuration
  7. SSH setup          
  8. Oozie commands                    
  9. Output in HDFS
  10. Output email                      
  11. Oozie web console - screenshots
  12. Issues encountered and resolution



3.0.7. Data and script download:



3.0.8. Oozie job.properties file:



3.0.9. Oozie workflow.xml:



3.0.10. Shell script (fileUpload.sh):



3.0.11. HDFS load commands:



3.0.12. Oozie SMTP configuration:



3.0.13. Oozie SSH setup:



3.0.14. Oozie commands:



3.0.15. Output in HDFS:



3.0.16. Output email:



3.0.17. Issues encountered:



3.0.18. Oozie web console - screenshots:



































Any additional insights are greatly appreciated.
Cheers!

New Impala e-Book from O’Reilly Media - Free

Folks,
Check this out...
http://blog.cloudera.com/blog/2013/10/download-the-new-impala-e-book-from-oreilly-media/

Download location:
http://www.cloudera.com/content/cloudera/en/resources/library/aboutcloudera/cloudera-impala-ebook.html

Thanks to Manish Verma, for emailing me the link.

Cheers!