Monday, June 17, 2013

Apache Oozie -Part 4: Oozie workflow with java mapreduce action


What's covered in the blog?

1. Documentation on the Oozie map reduce action
2. A sample workflow that includes oozie map-reduce action to process some syslog generated log files.  Instructions on loading sample data and running the workflow are provided, along with some notes based on my learnings.

Versions covered:
Oozie 3.3.0; Map reduce new API

Related blogs:

Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action 
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered 
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action

Your thoughts/updates:
If you want to share your thoughts/updates, email me at airawat.blog@gmail.com.

About the Oozie MapReduce action
Excerpt from Apache Oozie documentation...

The map-reduce action starts a Hadoop map/reduce job from a workflow. Hadoop jobs can be Java Map/Reduce jobs or streaming jobs.

A map-reduce action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).

The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.

The counters of the Hadoop job and job exit status (=FAILED=, KILLED or SUCCEEDED ) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.

The map-reduce action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.

Hadoop JobConf properties can be specified in a JobConf XML file bundled with the workflow application or they can be indicated inline in the map-reduce action configuration.

The configuration properties are loaded in the following order, streaming , job-xml and configuration , and later values override earlier values.

Streaming and inline property values can be parameterized (templatized) using EL expressions.

The Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration.


Apache Oozie documentation:
http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.2_Map-Reduce_Action


Components of a workflow with java map reduce action:



Sample workflow

Highlights

The sample workflow application runs a java map reduce program that parses log files (syslog generated) in HDFS and generates a report on the same.

The following is a pictorial representation of the workflow.


Workflow application details


Oozie web GUI - screenshots

http://YourOozieServer:TypicallyPort11000/oozie/






Do share, if you have any additional insights that can be addd to the blog.

References


Map reduce cookbook
https://cwiki.apache.org/OOZIE/map-reduce-cookbook.html

How to use a sharelib in Oozie
http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/

Everything-you-wanted-to-know-but-were-afraid-to-ask-about-oozie

http://www.slideshare.net/ChicagoHUG/everything-you-wanted-to-know-but-were-afraid-to-ask-about-oozie

Oozie workflow use cases

https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases 






51 comments:

  1. Hi,

    I am getting following error after running Oozie in Cloudera VM.
    Error: AUTHENTICATION : Could not authenticate, Authentication failed, status: -1, message: null

    Please help me out in solving this issue.Thanks in advance

    ReplyDelete
  2. I was looking about the Oracle Training in Chennai for something like this ,Thank you for posting the great content..I found it quiet interesting, hopefully you will keep posting such blogs…

    Oracle Training in chennai

    ReplyDelete
  3. Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me.I get a lot of great information from this blog. Thank you for your sharing this informative blog.

    Pega Training in Chennai

    ReplyDelete
  4. I have read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job.

    SAS Training in Chennai

    ReplyDelete
  5. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic

    Green Technologies In Chennai

    ReplyDelete
  6. when i try to run this command i get this error

    COMMAND:oozie job -oozie http://localhost:11000/oozie -config /home/hduser/oozie/oozie-4.1.0/oozie-bin/examples/apps/map-
    reduce/job.properties -run

    ERROR:Error: E0501 : E0501: Could not perform authorization operation, Call From ubuntu/127.0.1.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused


    ReplyDelete
  7. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Testing courses in chennai | Software testing course

    ReplyDelete
  8. Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
    cloud computing training in chennai|cloud computing courses in chennai|cloud computing training

    ReplyDelete
  9. Oracle database management system is a very secure and reliable platform for storing database and secured information.Due its reliable and trustworthy factor oracle DBA is famous all around the globe and is prefered by many large MNC which are using database management system.
    oracle training in Chennai | oracle dba training in chennai | oracle training institutes in chennai

    ReplyDelete
  10. Great post. This is useful. Thanks for sharing.

    IELTS classes in Kuwait

    ReplyDelete
  11. Thanks for sharing such a great information..Its really nice and informative..
    sas training in chennai

    ReplyDelete
  12. • Nice information in the post....Keep on sharing..
    ios training in chennai

    ReplyDelete
  13. You have shared very useful details with me. Thanks for your great effort.
    DBA course | Oracle dba course

    ReplyDelete
  14. Awesome.This blog worked perfectly for me. Thanks!
    Regards,
    Kevin Costner

    ReplyDelete
  15. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
    amazon-web-services-training-institute-in-chennai

    ReplyDelete
  16. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    selenium training in chennai

    ReplyDelete
  17. The young boys ended up stimulated to read through them and now have unquestionably been having fun with these things.

    Digital Marketing Training in Chennai

    ReplyDelete
  18. Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming post to enhance my skill set, keep blogging.
    Regards,

    Ece Project Centers in Chennai | Mba Application Projects in Chennai

    ReplyDelete
  19. It is really very helpful for me and I have gathered some important information from this blog.

    Data Mining Project Centers in Chennai | Secure Computing Project Centers in Chennai.

    ReplyDelete
  20. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete
  21. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete
  22. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    Big Data Hadoop training in electronic city

    ReplyDelete
  23. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    java training in chennai | java training in bangalore

    java online training | java training in pune

    selenium training in chennai

    selenium training in bangalore

    ReplyDelete

  24. This article is very much helpful and i hope this will be an useful information for the needed one.Keep on updating these kinds of informative things...

    Embedded System training in Chennai | Embedded system training institute in chennai | PLC Training institute in chennai | IEEE final year projects in chennai | VLSI training institute in chennai

    ReplyDelete
  25. Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for developing their skill to get good career.
    Python training in pune
    AWS Training in chennai
    Python course in chennai

    ReplyDelete
  26. Read all the information that i've given in above article. It'll give u the whole idea about it.
    DevOps online Training
    Best Devops Training institute in Chennai

    ReplyDelete
  27. Amazing information,thank you for your ideas.after along time i have studied
    an interesting information's.we need more updates in your blog.
    Android Courses in OMR
    Android Training Institutes in T nagar
    Best Android Training Institute in Anna nagar
    android app development course in bangalore

    ReplyDelete
  28. I ‘d mention that most of us visitors are endowed to exist in a fabulous place with very many wonderful individuals with very helpful things.
    nebosh course in chennai

    ReplyDelete
  29. This comment has been removed by the author.

    ReplyDelete
  30. Nice tutorial. Thanks for sharing the valuable information. it’s really helpful. Who want to learn this blog most helpful. Keep sharing on updated tutorials…
    Devops Training courses
    Devops Training in Bangalore
    Best Devops Training in pune
    Devops interview questions and answers
    Devops interview questions and answers

    ReplyDelete
  31. I really appreciate this post. I’ve been looking all over for this! Thank goodness I found it on Bing. You’ve made my day! Thx again!
    python training Course in chennai
    python training in Bangalore
    Python training institute in bangalore

    ReplyDelete
  32. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way. T hank you so much.
    Selenium Training
    Selenium Course in Chennai
    Selenium Training Institute in Chennai
    Best Software Testing Training Institute in Chennai
    Testing training
    Software testing training institutes

    ReplyDelete
  33. The way of you expressing your ideas is really good.you gave more useful ideas for us and please update more ideas for the learners.
    vmware training in bangalore
    vmware courses in bangalore
    vmware Training in Ambattur
    vmware Training in Guindy

    ReplyDelete
  34. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Devops Training in Chennai | Devops Training Institute in Chennai

    ReplyDelete

  35. Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    Android Course Training in Chennai | Best Android Training in Chennai
    Selenium Course Training in Chennai | Best Selenium Training in chennai
    Devops Course Training in Chennai | Best Devops Training in Chennai

    ReplyDelete
  36. Thank you for sharing such a nice post!

    Get Web Methods Training in Bangalore from Real Time Industry Experts with 100% Placement Assistance in MNC Companies. Book your Free Demo with Softgen Infotech.

    ReplyDelete
  37. nice post

    https://www.techsoftskillsource.com/

    ReplyDelete