Hooked on Hadoop: Running native mapreduce jobs inside Pig

There might be situations were you may have to reuse java map reduce programs within a pig program. This blog includes a sample pig script, with associated jars and sample data. The input is Syslog generated log files, and the output is a count of occurrences of processes logged inception to date.

Apache Pig documentation:
http://pig.apache.org/docs/r0.10.0/basic.html#mapreduce

My blog 1 on Log parsing in Hadoop (link) covers the Java code. This blog blog uses the jar from the blog in a pig script.

Details on running native mapreduce job in Pig scripts:

Hooked on Hadoop

Wednesday, July 3, 2013

Running native mapreduce jobs inside Pig

10 comments:

Search

Blog archive

Popular Posts

Total Pageviews