Wednesday, October 30, 2013

Apache Oozie - Part 13: Oozie SSH Action


1.0. What's covered in the blog?

1. Documentation on the Oozie SSH action
2. Sample oozie workflow application that demonstrates the SSH action - SSH to a specific node, as a specified user, and executes a local shell script that loads a local file to HDFS.

It was tricky getting this action working - and the solution is not something covered in the Apache documentation.  Issues and resolution are documented below.  

Version:
Oozie 3.3.0

Related blogs:
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action 
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered 
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action


2.0. Documentation on the Oozie SSH Action


Apache documentation is available at - http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.5_Ssh_Action

Note: The functionality was going to be eventually removed but later decided that it would remain.
So, disregard any mention of deprecation.


3.0. Sample workflow application


3.0.1. Highlights:
Oozie server is running on node cdh-dev01 in my environment.
With the sample workflow application, I am going to submit an Oozie job while logged in as myself (akhanolk), on this machine (Oozie server - cdh-dev01) from the CLI.
The workflow executes a shell script on cdh-dn01 as user akhanolk.  The shell script loads a local file to HDFS.  If the file load completes successfully, the workflow sends an email to me.

3.0.2. Pictorial overview:













3.0.3. SSH setup:
1. Passphrase-less SSH for akhanolk from cdh-dev01 (Oozie server) to cdh-dn01 (remote node) and vice versa
2.  Passphrase-less SSH for oozie user ID (oozie in my case) on cdh-dev01 to cdh-dn01 as akhanolk
[Running ps -ef | grep oozie on Oozie server will give you the configured Oozie user ID]

3.0.4. Workflow application components:
workflow definition (workflow.xml - in HDFS)
job properties file (job.properties from node submitting job)
Shell script (uploadFile.sh) on remote node (cdh-dn01; At /home/akhanolk/scripts)
Data file (employees_data) on remote node (cdh-dn01; At /home/akhanolk/data)

3.0.5. Desired result:
Upon execution of the workflow, the employees_data on cdh-dn01 should get moved to a specified directory in HDFS

3.0.6. Subsequent sections cover-

  1. Data and script download
  2. Oozie job properties file        
  3. Oozie workflow  file
  4. Shell script - uploadFile.sh
  5. Data load commands                
  6. Oozie SMTP configuration
  7. SSH setup          
  8. Oozie commands                    
  9. Output in HDFS
  10. Output email                      
  11. Oozie web console - screenshots
  12. Issues encountered and resolution



3.0.7. Data and script download:
************************************
*Data and code/application download
************************************
Data and code:
--------------
Github:
https://github.com/airawat/OozieSamples
Email me at airawat.blog@gmail.com if you encounter any issues
Directory structure of application download
--------------------------------------------
oozieProject
workflowSshAction
job.properties
workflow.xml
scripts
uploadFile.sh
data
employees_data



3.0.8. Oozie job.properties file:
#*************************************************
# job.properties
#*************************************************
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020
jobTracker=cdh-jt01:8021
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
appPath=${oozieProjectRoot}/workflowSshAction
oozie.wf.application.path=${appPath}
inputDir=${oozieProjectRoot}/data
focusNodeLogin=akhanolk@cdh-dn01
shellScriptPath=~/scripts/uploadFile.sh
emailToAddress=akhanolk@cdh-dev01



3.0.9. Oozie workflow.xml:
<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${focusNodeLogin}</host>
<command>${shellScriptPath}</command>
<capture-output/>
</ssh>
<ok to="sendEmail"/>
<error to="killAction"/>
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
view raw 04-WorkflowXML hosted with ❤ by GitHub



3.0.10. Shell script (fileUpload.sh):
#################################
# Name: uploadFile.sh
# Location: remote node where we
# want to run an
# operation
#################################
#!/bin/bash
hadoop fs -rm -R oozieProject/results-sshAction/*
hadoop fs -put ~/data/* oozieProject/results-sshAction/
status=$?
if [ $status = 0 ]; then
echo "STATUS=SUCCESS"
else
echo "STATUS=FAIL"
fi
view raw 05-ShellScript hosted with ❤ by GitHub



3.0.11. HDFS load commands:
*****************************************
Location of files/scripts & commands
*****************************************
I have pasted information specific to my environment; Modify as required.
1) Node (cdh-dev01) where the Oozie CLI will be used to submit/run Oozie workflow:
Structure/Path:
~/oozieProject/workflowSshAction/job.properties
2) HDFS:
Workflow directory structure:
/user/akhanolk/oozieProject/workflowSshAction/workflow.xml
Commands to load:
hadoop fs -mkdir oozieProject
hadoop fs -mkdir oozieProject/workflowSshAction
hadoop fs -put ~/oozieProject/workflowSshAction/workflow.xml oozieProject/workflowSshAction
Output directory structure:
/user/akhanolk/oozieProject/results-sshAction
Command:
hadoop fs -mkdir oozieProject/results-sshAction
3) Remote node (cdh-dn01) where we want to run a shell script:
Directory structure/Path:
~/scripts/uploadFile.sh
~/data/employee_data



3.0.12. Oozie SMTP configuration:
Oozie SMTP configuration
------------------------
Add the following to the oozie-site.xml, and restart oozie.
Replace values with the same specific to your environment.
<!-- SMTP params-->
<property>
<name>oozie.email.smtp.host</name>
<value>cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.port</name>
<value>25</value>
</property>
<property>
<name>oozie.email.from.address</name>
<value>oozie@cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.auth</name>
<value>false</value>
</property>
<property>
<name>oozie.email.smtp.username</name>
<value></value>
</property>
<property>
<name>oozie.email.smtp.password</name>
<value></value>
</property>



3.0.13. Oozie SSH setup:
************************
SSH setup
************************
Issues:
Review my section on issues encountered to see all the issues and fixes I had to make
to get the workflow application to work.
------------------------------------------------------------------------------------------------------
Oozie documentation:
To run SSH Testcases and for easier Hadoop start/stop configure SSH to localhost to be passphrase-less.
Create your SSH keys without a passphrase and add the public key to the authorized file:
$ ssh-keygen -t dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys2
Test that you can ssh without password:
$ ssh localhost
------------------------------------------------------------------------------------------------------
SSH tutorial:
Setup ssh - https://www.digitalocean.com/community/articles/how-to-set-up-ssh-keys--2
view raw 08b-SSHSetup hosted with ❤ by GitHub



3.0.14. Oozie commands:
Oozie commands
---------------
Note: Replace oozie server and port, with your cluster-specific.
1) Submit job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -submit
job: 0000012-130712212133144-oozie-oozi-W
2) Run job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000014-130712212133144-oozie-oozi-W
3) Check the status:
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000014-130712212133144-oozie-oozi-W
4) Suspend workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000014-130712212133144-oozie-oozi-W
5) Resume workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000014-130712212133144-oozie-oozi-W
6) Re-run workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -rerun 0000014-130712212133144-oozie-oozi-W
7) Should you need to kill the job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000014-130712212133144-oozie-oozi-W
8) View server logs:
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000014-130712212133144-oozie-oozi-W
Logs are available at:
/var/log/oozie on the Oozie server.



3.0.15. Output in HDFS:
************************
Output
************************
[akhanolk@cdh-dev01 ~]$ hadoop fs -ls oozieProject/res*
Found 1 items
-rw-r--r-- 3 akhanolk akhanolk 13821993 2013-10-30 20:59 oozieProject/results-sshAction/employees_data
view raw 10-Output hosted with ❤ by GitHub



3.0.16. Output email:
********************
Output email
********************
From akhanolk@cdh-dev01.localdomain Wed Oct 30 22:59:16 2013
Return-Path: <akhanolk@cdh-dev01.localdomain>
X-Original-To: akhanolk@cdh-dev01
Delivered-To: akhanolk@cdh-dev01.localdomain
From: akhanolk@cdh-dev01.localdomain
To: akhanolk@cdh-dev01.localdomain
Subject: Output of workflow 0000003-131029234028597-oozie-oozi-W
Content-Type: text/plain; charset=us-ascii
Date: Wed, 30 Oct 2013 22:59:16 -0500 (CDT)
Status: R
Status of the file move: SUCCESS
view raw 11-OutputEmail hosted with ❤ by GitHub



3.0.17. Issues encountered:
*************************
Issues encountered
*************************
Permissions denied error:
-------------------------
....
2013-10-29 16:13:25,949 WARN org.apache.oozie.command.wf.ActionStartXCommand:
USER[akhanolk] GROUP[-] TOKEN[] APP[WorkFlowForSshAction] JOB[0000002-
131029144918199-oozie-oozi-W] ACTION[0000002-131029144918199-oozie-oozi-
W@sshAction] Error starting action [sshAction]. ErrorType [NON_TRANSIENT],
ErrorCode [AUTH_FAILED], Message [AUTH_FAILED: Not able to perform operation
[ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01
mkdir -p oozie-oozi/0000002-131029144918199-oozie-oozi-W/sshAction--ssh/ ]
| ErrorStream: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Steps taken to resolve:
-----------------------
a)
Tried running the command in square brackets, above, manually from cdh-dev01 (Oozie server),
when logged in as akhanolk. It worked! But the worklow in Oozie didnt;
b)
Tried running as Oozie-
sudo -u oozie ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01 mkdir
-p oozie-oozi/0000001-1310081859355-oozie-oozi-W/action1--ssh/
Got the error
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
c)
Googled - and chanced upon this-
http://stackoverflow.com/questions/19272430/oozie-ssh-action
So, performed the necessary actions detailed below to allow oozie to ssh to cdh-dn01 as akhanolk
On cdh-dev01 (my Oozie server), located the oozie home directory and ran ssh keygen
Appended the public key to authorized_keys file home/akhanolk/.ssh/authorized_keys on cdh-dev01
Appended the same public key to authorized_keys file in cdh-dn01 (remote node) at
home/akhanolk/.ssh/authorized_keys
Issue resolved!!



3.0.18. Oozie web console - screenshots:



































Any additional insights are greatly appreciated.
Cheers!

New Impala e-Book from O’Reilly Media - Free

Folks,
Check this out...
http://blog.cloudera.com/blog/2013/10/download-the-new-impala-e-book-from-oreilly-media/

Download location:
http://www.cloudera.com/content/cloudera/en/resources/library/aboutcloudera/cloudera-impala-ebook.html

Thanks to Manish Verma, for emailing me the link.

Cheers!