What's covered in the blog?
1. Documentation on the Oozie hive action2. A sample workflow that includes fs action, email action, and hive action (query against some syslog generated log files).
Version:
Oozie 3.3.0
My other blogs on Oozie:
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action
Your thoughts/updates:
If you want to share your thoughts/updates, email me at airawat.blog@gmail.com.
About the Hive action
http://archive.cloudera.com/cdh4/cdh/4/oozie/DG_HiveActionExtension.htmlSalient features of the hive action:
- Runs the specified hive job synchronously (the workflow job will wait until the Hive job completes before continuing to the next action).
- Can be configured to create or delete HDFS directories before starting the Hive job.
- Supports Hive scripts with parameter variables, their syntax is ${VARIABLES} .
- Hive configuration needs to be specified as part of the job submission
- Oozie EL expressions can be used in the inline configuration. Property values specified in the configuration element override values specified in the job-xml file.
- Note that Hadoop mapred.job.tracker and fs.default.name properties must not be present in the inline configuration.
- As with Hadoop map-reduce jobs, it is possible to add files and archives in order to make them available to the Hive job.
Components of a workflow with hive action:
For a workflow with (just a) hive action, the following are required:
1. workflow.xml
2. job.properties
3. Any files, archives, jars you want to add
4. hive-site.xml
5. Hive query scripts
Refer sample program below.
Sample program
Highlights:
The workflow application runs a report on data in Hive. The input is log data (Syslog generated) in Hive, output is a table containing the report results in Hive.Pictorial overview of application:
Application:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a simple workflow application (oozie 3.3.0) that generates | |
a report off of data in a Hive table; Emails are sent out to notify designated users of | |
success/failure of workflow. The data is syslog generated log files; The Hive table row | |
format serde is regex serde. | |
The sample application includes: | |
-------------------------------- | |
1. Oozie actions: hive action and email action | |
2. Oozie workflow controls: start, end, and kill. | |
3. Workflow components: job.properties and workflow.xml | |
4. Sample data | |
5. Commands to deploy workflow, submit and run workflow | |
6. Oozie web console - screenshots from sample program execution | |
Pictorial overview of workflow: | |
------------------------------- | |
Available at: | |
http://hadooped.blogspot.com/2013/06/apache-oozie-part-2-workflow-hive-action.html | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
01. Workflow Components: | |
------------------------ | |
1. job.properties | |
File containing: | |
a) parameter and value declarations that are referenced in the workflows, and | |
b) environment information referenced by Oozie to run the workflow including name node, job tracker, workflow application path etc | |
2. workflow.xml | |
Workflow definition file | |
3. hive-site.xml | |
Hive configuration file | |
4. Report query file in hive query language |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02a.1. Download location: | |
----------------------- | |
Github: | |
https://github.com/airawat/OozieSamples | |
Email me at airawat.blog@gmail.com to contact me if you have access issues. | |
02a.2. Directory structure applicable for this gist/blog/post: | |
--------------------------------------------------------------- | |
oozieProject | |
data | |
airawat-syslog | |
<<node>> | |
<<year>> | |
<<month>> | |
messages | |
workflowHiveAction | |
job.prperties | |
workflow.xml | |
hive-site.xml | |
runHiveLogReport.hql | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02b.1. Sample data | |
------------------- | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
02b.2. Structure | |
----------------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
03. Oozie SMTP configuration | |
---------------------------- | |
Add the following to the oozie-site.xml, and restart oozie. | |
Replace values with the same specific to your environment. | |
<!-- SMTP params--> | |
<property> | |
<name>oozie.email.smtp.host</name> | |
<value>cdh-dev01</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.port</name> | |
<value>25</value> | |
</property> | |
<property> | |
<name>oozie.email.from.address</name> | |
<value>oozie@cdh-dev01</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.auth</name> | |
<value>false</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.username</name> | |
<value></value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.password</name> | |
<value></value> | |
</property> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#***************************** | |
# 04. job.properties | |
#***************************** | |
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020 | |
jobTracker=cdh-jt01:8021 | |
queueName=default | |
oozie.libpath=${nameNode}/user/oozie/share/lib | |
oozie.use.system.libpath=true | |
oozie.wf.rerun.failnodes=true | |
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject | |
appPath=${oozieProjectRoot}/workflowHiveAction | |
oozie.wf.application.path=${appPath} | |
emailToAddress=akhanolk@cdh-dev01 | |
#*******End************************ | |
Note: -The line - "oozie.wf.rerun.failnodes=true" is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--******************************************--> | |
<!--05. workflow.xml --> | |
<!--******************************************--> | |
<workflow-app name="WorkflowWithHiveAction" xmlns="uri:oozie:workflow:0.1"> | |
<start to="hiveAction"/> | |
<action name="hiveAction"> | |
<hive xmlns="uri:oozie:hive-action:0.2"> | |
<job-tracker>${jobTracker}</job-tracker> | |
<name-node>${nameNode}</name-node> | |
<job-xml>${appPath}/hive-site.xml</job-xml> | |
<script>${appPath}/runHiveLogReport.hql</script> | |
</hive> | |
<ok to="sendEmailSuccess"/> | |
<error to="sendEmailKillHive"/> | |
</action> | |
<action name="sendEmailSuccess"> | |
<email xmlns="uri:oozie:email-action:0.1"> | |
<to>${emailToAddress}</to> | |
<subject>Status of workflow ${wf:id()}</subject> | |
<body>The workflow ${wf:id()} completed successfully</body> | |
</email> | |
<ok to="end"/> | |
<error to="end"/> | |
</action> | |
<action name="sendEmailKillHive"> | |
<email xmlns="uri:oozie:email-action:0.1"> | |
<to>${emailToAddress}</to> | |
<subject>Status of workflow ${wf:id()}</subject> | |
<body>The workflow ${wf:id()} had issues with the hive action and was killed. The error logged is: ${wf:errorMessage(wf:lastErrorNode());}</body> | |
</email> | |
<ok to="killJobAction"/> | |
<error to="killJobAction"/> | |
</action> | |
<kill name="killJobAction"> | |
<message>"Killed job due to error in FS Action"</message> | |
</kill> | |
<end name="end" /> | |
</workflow-app> | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
06a. Hdfs commands | |
------------------- | |
1) Load the data and workflow application to hadoop | |
$ hadoop fs -mkdir oozieProject | |
$ hadoop fs -put oozieProject/* oozieProject/ | |
2) Validate load | |
$ hadoop fs -ls -R oozieProject |awk '{print $8}' | |
..should match directory listing in section 2, above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************ | |
**06b. Hive setup tasks | |
************************ | |
a) Create table: | |
hive> | |
CREATE EXTERNAL TABLE SysLogEvents( | |
month_name STRING, | |
day STRING, | |
time STRING, | |
host STRING, | |
event STRING, | |
log STRING) | |
PARTITIONED BY(node string,year int, month int) | |
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' | |
WITH SERDEPROPERTIES ( | |
"input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)" | |
) | |
stored as textfile; | |
b) Create and load partitions: | |
Note: Replace my user ID "akhanolk" with yours | |
hive > | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dev01",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dev01/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dev01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dev01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn02",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn02/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn02",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn02/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn03",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn03/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn03",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn03/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-jt01",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-jt01/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-jt01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-jt01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-nn01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-nn01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-vms",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-vms/2013/05/'; | |
c) Hive ql to test data loaded: | |
hive> | |
--Print headers | |
set hive.cli.print.header=true; | |
--Need to add this jar for MR to work..your env may not need it | |
add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; | |
--Sample query | |
hive> select * from SysLogEvents limit 2; | |
OK | |
month_name day time host event log node year month | |
Apr 23 16:14:10 cdh-dev01 spice-vdagent[5657]: Missing virtio device '/dev/virtio-ports/com.redhat.spice.0': No such file or directory cdh-dev01 2013 04 | |
Apr 23 16:14:12 cdh-dev01 pulseaudio[5705]: pid.c: Daemon already running. cdh-dev01 2013 04 | |
Time taken: 13.241 seconds | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************************************ | |
**06c. Hive QL - runHiveLogReport.hql | |
************************************************ | |
use default; | |
drop table if exists eventsgranularreport; | |
CREATE TABLE IF NOT EXISTS eventsgranularreport( | |
year int, | |
month int, | |
day int, | |
event STRING, | |
occurrence int) | |
ROW FORMAT DELIMITED | |
FIELDS TERMINATED by ',' | |
LINES TERMINATED by '\n'; | |
INSERT OVERWRITE TABLE eventsgranularreport | |
select Year,Month,Day,Event,Count(*) Occurrence from SysLogEvents group by year,month,day,event order by event asc,year,month,day desc; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
07. Oozie commands | |
------------------- | |
Note: Replace oozie server and port, with your cluster-specific. | |
1) Submit job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHiveAction/job.properties -submit | |
job: 0000007-130712212133144-oozie-oozi-W | |
2) Run job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000007-130712212133144-oozie-oozi-W | |
3) Check the status: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000007-130712212133144-oozie-oozi-W | |
4) Suspend workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000007-130712212133144-oozie-oozi-W | |
5) Resume workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000007-130712212133144-oozie-oozi-W | |
6) Re-run workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHiveAction/job.properties -rerun 0000007-130712212133144-oozie-oozi-W | |
7) Should you need to kill the job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000007-130712212133144-oozie-oozi-W | |
8) View server logs: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000007-130712212133144-oozie-oozi-W | |
Logs are available at: | |
/var/log/oozie on the Oozie server. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
08. Output | |
----------- | |
hive> select * from eventsgranularreport; | |
OK | |
year month day event occurrence | |
2013 5 NULL NULL 25 | |
2013 5 3 NetworkManager[1232]: 1 | |
2013 5 7 NetworkManager[1243]: 1 | |
2013 5 7 NetworkManager[1284]: 1 | |
2013 5 7 NetworkManager[1292]: 1 | |
2013 5 7 NetworkManager[1300]: 1 | |
2013 5 3 NetworkManager[1342]: 1 | |
2013 5 7 NetworkManager[1422]: 1 | |
2013 5 3 console-kit-daemon[1580]: 1 | |
2013 5 3 console-kit-daemon[1779]: 4 | |
2013 5 7 console-kit-daemon[1861]: 1 | |
2013 5 7 console-kit-daemon[1964]: 1 | |
2013 5 7 gnome-session[1902]: 1 | |
2013 5 7 gnome-session[1980]: 3 | |
2013 5 7 gnome-session[2009]: 3 | |
2013 5 7 gnome-session[2010]: 3 | |
2013 5 7 gnome-session[2033]: 1 | |
2013 5 7 init: 104 | |
2013 5 6 init: 12 | |
2013 5 3 init: 50 | |
2013 5 7 kernel: 653 | |
2013 5 6 | |
... | |
.... | |
..... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
09-Sample email | |
--------------- | |
From akhanolk@cdh-dev01.localdomain Mon Jul 15 00:34:39 2013 | |
Return-Path: <akhanolk@cdh-dev01.localdomain> | |
X-Original-To: akhanolk@cdh-dev01 | |
Delivered-To: akhanolk@cdh-dev01.localdomain | |
From: akhanolk@cdh-dev01.localdomain | |
To: akhanolk@cdh-dev01.localdomain | |
Subject: Status of workflow 0000011-130712212133144-oozie-oozi-W | |
Content-Type: text/plain; charset=us-ascii | |
Date: Mon, 15 Jul 2013 00:34:39 -0500 (CDT) | |
Status: R | |
The workflow 0000011-130712212133144-oozie-oozi-W completed successfully |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10. Available at: | |
----------------- | |
http://hadooped.blogspot.com/2013/06/apache-oozie-part-2-workflow-hive-action.html |
Hi,
ReplyDeletecan you please show me how to move files from local directory to Hdfs every hour using oozie workflow.
i have tried oozie fs action but it only works if im moving file from one folder to another in hdfs.
i want to move a file from local directory to hdfs. please help!!!
Hi Khatutshelo-
DeleteCheck out my blog 13 on Oozie ssh action.
Cheers,
Anagha
Hi Anagha, What is the difference between ssh action and shell action? When should i go for shh action over the shell action. Could you please give some idea on the same.
Deletei have also same problem could plese post the process
ReplyDeleteThere are several avenues to ingesting data to HDFS.
ReplyDeleteHere is a good discussion on this topic to help arrive at the best solution based on your requirements-
http://www.linkedin.com/groups/Data-Ingestion-Into-Hadoop-3638279.S.199779955
Hive + oozie workflow problem.
ReplyDeletemy task is to create an oozie workflow to Load Data to Hive tables every hour.
i am using Hue 2.3.0
When i run the command: LOAD DATA INPATH '/user/username1/data/data3.txt' INTO TABLE raw_data; it works perfectly, data gets loaded to the hive table.
*When i run the same command on oozie workflow the job get killed at 66% and the error message is* Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
----
but whe i replace "LOAD DATA INPATH '/user/username1/data/data3.txt' INTO TABLE raw_data;" with "create external table table_data (json string) LOCATION '/user/username_k/20130925_data';" the oozie workflow works properly
May you please help.
Khathutshelo, can you send me your scripts/code, directory structure, samples files etc so I can try to replicate? Thanks.
ReplyDeleteHi Anagha,
ReplyDeletei am using Hue 2.3.0
-----------------------------------
what i did is:
create an external table1 where the location is the HDFS directory with data, it has 4 columns
create another external table2 with 3 columns and a partition column
write a script to Insert table2 with partition from table1
the problem is insert script works properly if i put it on the query editor but throws an 'Table not found'error
table1 and table2 are both in the same database(adhoc)
hive-script.q
---------------------
INSERT OVERWRITE TABLE production1 PARTITION(dt, hour)
SELECT a, b, datastring, hour FROM staging2;
--------------------------------------------------------------------------------------------------------------------
since the partition name will be created automatically, i decide not to use the following method:
ReplyDeleteAlter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-vms",year=2013, month=05)
however i make sure that the first table has 1 more column compared to the second table that has partition. the first column of the first table become the patirtion name on the second table.
eg table1 has 3 columns and table2 has 2 column and a partition column
Hi Anagha,
ReplyDeleteThanks for the blog. Would you mind giving an example of creating a workflow where in you can pass parameters. And also how to run it.
Thanks for the wonderful blog and helping us out !
Hi,
ReplyDeleteI'm planning to pass oozie workflow as a parameter and input value to hive script how to do this please advice.
create table wfl_tbl(date string,wf_id string);
insert into wfl_tbl select '${wf:id()}', '2014-015-15' from dummy ;
I'm submitting the above from Hue oozie editor it is taing ${wf:id()}. Please advice.
Regards,
R
Hi Anagha,
ReplyDeleteI am trying to create a directory based on the job_id created after running the oozie job using a shell script. can you help me ? can we pass JOB_ID as parameter while executing the script. id so how to do it..?
interesting blog to read.. After reading this blog i learnt more useful information from this blog.. thanks a lot for sharing this blog
ReplyDeletebest big data training | hadoop training institute in chennai | big data training institute in chennai
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
ReplyDeleteDevops Course Training in Chennai |Best Devops Training Institute in Chennai
Selenium Course Training in Chennai |Best Selenium Training Institute in Chennai
Java Course Training in Chennai | Best Java Training Institute in Chennai
Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
ReplyDeleteJava Training in Chennai | J2EE Training in Chennai | Advanced Java Training in Chennai | Core Java Training in Chennai | Java Training institute in Chennai
Nice post. it was so informative and keep sharing. Home lifts India
ReplyDelete