Monday, June 17, 2013

Apache Oozie -Part 4: Oozie workflow with java mapreduce action


What's covered in the blog?

1. Documentation on the Oozie map reduce action
2. A sample workflow that includes oozie map-reduce action to process some syslog generated log files.  Instructions on loading sample data and running the workflow are provided, along with some notes based on my learnings.

Versions covered:
Oozie 3.3.0; Map reduce new API

Related blogs:

Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action 
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered 
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
Blog 13: Oozie workflow - SSH action

Your thoughts/updates:
If you want to share your thoughts/updates, email me at airawat.blog@gmail.com.

About the Oozie MapReduce action
Excerpt from Apache Oozie documentation...

The map-reduce action starts a Hadoop map/reduce job from a workflow. Hadoop jobs can be Java Map/Reduce jobs or streaming jobs.

A map-reduce action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).

The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.

The counters of the Hadoop job and job exit status (=FAILED=, KILLED or SUCCEEDED ) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.

The map-reduce action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.

Hadoop JobConf properties can be specified in a JobConf XML file bundled with the workflow application or they can be indicated inline in the map-reduce action configuration.

The configuration properties are loaded in the following order, streaming , job-xml and configuration , and later values override earlier values.

Streaming and inline property values can be parameterized (templatized) using EL expressions.

The Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration.


Apache Oozie documentation:
http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a3.2.2_Map-Reduce_Action


Components of a workflow with java map reduce action:



Sample workflow

Highlights

The sample workflow application runs a java map reduce program that parses log files (syslog generated) in HDFS and generates a report on the same.

The following is a pictorial representation of the workflow.


Workflow application details


Oozie web GUI - screenshots

http://YourOozieServer:TypicallyPort11000/oozie/






Do share, if you have any additional insights that can be addd to the blog.

References


Map reduce cookbook
https://cwiki.apache.org/OOZIE/map-reduce-cookbook.html

How to use a sharelib in Oozie
http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/

Everything-you-wanted-to-know-but-were-afraid-to-ask-about-oozie

http://www.slideshare.net/ChicagoHUG/everything-you-wanted-to-know-but-were-afraid-to-ask-about-oozie

Oozie workflow use cases

https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases 






73 comments:

  1. Hi,

    I am getting following error after running Oozie in Cloudera VM.
    Error: AUTHENTICATION : Could not authenticate, Authentication failed, status: -1, message: null

    Please help me out in solving this issue.Thanks in advance

    ReplyDelete
  2. There are lots of information about latest technology and how to get trained in them, like Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  3. I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.. Oracle Training in chennai

    ReplyDelete
  4. I was looking about the Oracle Training in Chennai for something like this ,Thank you for posting the great content..I found it quiet interesting, hopefully you will keep posting such blogs…

    Oracle Training in chennai

    ReplyDelete
  5. Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me.I get a lot of great information from this blog. Thank you for your sharing this informative blog.

    Pega Training in Chennai

    ReplyDelete
  6. I have read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job.

    SAS Training in Chennai

    ReplyDelete
  7. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..

    QTP Training in Chennai

    ReplyDelete
  8. SAS Training in Chennai Thanks for sharing this informative blog. I did SAS Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..

    ReplyDelete
  9. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic

    Green Technologies In Chennai

    ReplyDelete
  10. Pretty article! I found some useful information in your blog, it was awesome to read,
    thanks for sharing this great content to my vision, keep sharing..

    Green Technologies In Chennai

    ReplyDelete
  11. Greens Technology offer a wide range of training from ASP.NET , SharePoint, Cognos, OBIEE, Websphere, Oracle, DataStage, Datawarehousing, Tibco, SAS, Sap- all Modules, Database Administration, Java and Core Java, C#, VB.NET, SQL Server and Informatica, Bigdata, Unix Shell, Perl scripting, SalesForce , RedHat Linux and Many more.

    ReplyDelete
  12. Greens Technologies Training In Chennai Excellent information with unique content and it is very useful to know about the information based on blogs.

    ReplyDelete
  13. Thanks for sharing this nice useful informative post to our knowledge, Actually SAS used in many companies for their day to day business activities it has great scope in future.

    ReplyDelete
  14. Our HP Quick Test Professional course includes basic to advanced level and our QTP course is designed to get the placement in good MNC companies in chennai as quickly as once you complete the QTP certification training course.

    ReplyDelete
  15. A Best Pega Training course that is exclusively designed with Basics through Advanced Pega Concepts.With our Pega Training in Chennai you’ll learn concepts in expert level with practical manner.We help the trainees with guidance for Pega System Architect Certification and also provide guidance to get placed in Pega jobs in the industry.

    ReplyDelete
  16. Great post and informative blog.it was awesome to read, thanks for sharing this great content to my vision.
    Informatica Training In Chennai

    ReplyDelete
  17. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  18. Oracle Training in Chennai is one of the best oracle training institute in Chennai which offers complete Oracle training in Chennai by well experienced Oracle Consultants having more than 12+ years of IT experience.

    ReplyDelete
  19. It is really very helpful for us and I have gathered some important information from this blog.
    Oracle Training In Chennai

    ReplyDelete
  20. when i try to run this command i get this error

    COMMAND:oozie job -oozie http://localhost:11000/oozie -config /home/hduser/oozie/oozie-4.1.0/oozie-bin/examples/apps/map-
    reduce/job.properties -run

    ERROR:Error: E0501 : E0501: Could not perform authorization operation, Call From ubuntu/127.0.1.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused


    ReplyDelete
  21. There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training In Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this blogs..

    ReplyDelete
  22. I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
    SalesForce Training in Chennai

    ReplyDelete
  23. Pretty article! I found some useful information in your blog, it was awesome to read,thanks for sharing this great content to my vision, keep sharing..
    Unix Training In Chennai

    ReplyDelete
  24. This information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post,thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
    Android Training In Chennai In Chennai

    ReplyDelete
  25. I have read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job.
    SAP Training in Chennai

    ReplyDelete
  26. Oracle Training in chennai
    Thanks for sharing such a great information..Its really nice and informative..

    ReplyDelete
  27. Selenium Training in Chennai
    Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

    ReplyDelete
  28. Data warehousing Training in Chennai
    I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..

    ReplyDelete
  29. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
    Websphere Training in Chennai

    ReplyDelete
  30. Oracle DBA Training in Chennai
    Thanks for sharing this informative blog. I did Oracle DBA Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..

    ReplyDelete
  31. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Testing courses in chennai | Software testing course

    ReplyDelete
  32. Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
    hadoop training in chennai|big data training|big data training in chennai

    ReplyDelete
  33. Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
    cloud computing training in chennai|cloud computing courses in chennai|cloud computing training

    ReplyDelete
  34. Thanks admin for sharing this valid information. To get great exposure in your career, I suggest you must take some basic certification training to develop your skill set to switch your career into the IT sector. In that Hadoop place valid role to teach all the fundamentals to begin learn about bid data.
    Regards,

    Fita Chennai reviews|Hadoop Training in Chennai|Big Data Training in Chennai

    ReplyDelete
  35. Oracle database management system is a very secure and reliable platform for storing database and secured information.Due its reliable and trustworthy factor oracle DBA is famous all around the globe and is prefered by many large MNC which are using database management system.
    oracle training in Chennai | oracle dba training in chennai | oracle training institutes in chennai

    ReplyDelete
  36. Database means to maintain and organize all the files in a systematic format where the data can be easily accessible when needed.
    Oracle DBA training in chennai | Oracle training in chennai | Oracle course in Chennai

    ReplyDelete
  37. Big data is used extensively in MNC today as using big data leads to accurate decision making and there are is a huge demand for the big data analysts.
    Big data training in Chennai | Hadoop training in Chennai | Big data training institute in Chennai

    ReplyDelete
  38. Great post. This is useful. Thanks for sharing.

    IELTS classes in Kuwait

    ReplyDelete
  39. for preparing bank exam and group exam , we are providing an online test model questions papers

    Bank Exam Questions and Answers

    Group Exam Questions and Answers

    ReplyDelete
  40. Thanks for sharing such a great information..Its really nice and informative..
    sas training in chennai

    ReplyDelete
  41. • Nice information in the post....Keep on sharing..
    ios training in chennai

    ReplyDelete
  42. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

    iOS Training in Chennai

    ReplyDelete

  43. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  44. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.
    PEGA Training in Chennai

    ReplyDelete
  45. You have shared very useful details with me. Thanks for your great effort.
    DBA course | Oracle dba course

    ReplyDelete
  46. Awesome.This blog worked perfectly for me. Thanks!
    Regards,
    Kevin Costner

    ReplyDelete
  47. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
    amazon-web-services-training-institute-in-chennai

    ReplyDelete
  48. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    selenium training in chennai

    ReplyDelete
  49. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    Best selenium training Institute in chennai

    ReplyDelete
  50. The young boys ended up stimulated to read through them and now have unquestionably been having fun with these things.

    Digital Marketing Training in Chennai

    ReplyDelete
  51. Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming post to enhance my skill set, keep blogging.
    Regards,

    Ece Project Centers in Chennai | Mba Application Projects in Chennai

    ReplyDelete
  52. It is really very helpful for me and I have gathered some important information from this blog.

    Data Mining Project Centers in Chennai | Secure Computing Project Centers in Chennai.

    ReplyDelete
  53. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete
  54. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete
  55. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    Big Data Hadoop training in electronic city

    ReplyDelete
  56. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Data Science Training in Chennai
    Data science training in bangalore
    Data science online training
    Data science training in pune
    Data science training in kalyan nagar

    ReplyDelete
  57. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Devops training in Chennai
    Devops training in Bangalore
    Devops Online training
    Devops training in Pune

    ReplyDelete
  58. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.

    ccna training in chennai



    ccna training in bangalore


    ccna training in pune

    ReplyDelete
  59. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    java training in chennai | java training in bangalore

    java online training | java training in pune

    selenium training in chennai

    selenium training in bangalore

    ReplyDelete