Wednesday, July 3, 2013

Running native mapreduce jobs inside Pig

There might be situations were you may have to reuse java map reduce programs within a pig program. This blog includes a sample pig script, with associated jars and sample data. The input is Syslog generated log files, and the output is a count of occurrences of processes logged inception to date.

Apache Pig documentation:

My blog 1 on Log parsing in Hadoop (link) covers the Java code. This blog blog uses the jar from the blog in a pig script.

Details on running native mapreduce job in Pig scripts:


