Apache Pig documentation:
http://pig.apache.org/docs/r0.10.0/basic.html#mapreduce
My blog 1 on Log parsing in Hadoop (link) covers the Java code. This blog blog uses the jar from the blog in a pig script.
Details on running native mapreduce job in Pig scripts:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files through a | |
java mapreduce program that uses regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
Pig version: version 0.10.0 | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript | |
Pig script execution command: 05-PigLatinScriptExecution | |
Output: 06-Output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sample data | |
------------ | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
Structure | |
---------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data download | |
------------- | |
https://groups.google.com/forum/?hl=en#!topic/hadooped/DMQVIwBUQOo | |
Directory structure | |
------------------- | |
LogParserSamplePigMR | |
Data | |
airawat-syslog | |
2013 | |
04 | |
messages | |
2013 | |
05 | |
messages | |
lib | |
LogEventCount.jar | |
SysLog-PigMR-Report.pig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Commands to load to HDFS [03-HdfsLoadCommands] | |
---------------------------------------------- | |
$ hadoop fs -put LogParserSamplePigMR | |
$ hadoop fs -ls -R LogParserSamplePigMR | awk '{print $8}' | |
LogParserSamplePigMR/Data | |
LogParserSamplePigMR/Data/airawat-syslog | |
LogParserSamplePigMR/Data/airawat-syslog/2013 | |
LogParserSamplePigMR/Data/airawat-syslog/2013/04 | |
LogParserSamplePigMR/Data/airawat-syslog/2013/04/messages | |
LogParserSamplePigMR/Data/airawat-syslog/2013/05 | |
LogParserSamplePigMR/Data/airawat-syslog/2013/05/messages | |
LogParserSamplePigMR/SysLog-PigMR-Report.pig | |
LogParserSamplePigMR/lib | |
LogParserSamplePigMR/lib/LogEventCount.jar | |
ParserSamplePigMR/reportDir/_logs/history/job_201306261042_0054_1372873417824_akhanolk_PigLatin%3ASysLog-PigMR-Report.pig | |
LogParserSamplePigMR/reportDir/part-m-00000 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/*----------------------------------------*/ | |
/*PigLatinScript - SysLog-PigMR-Report.pig*/ | |
/*----------------------------------------*/ | |
rmf LogParserSamplePigMR/outputDir | |
rmf LogParserSamplePigMR/inputDir | |
rmf LogParserSamplePigMR/reportDir | |
raw_log_DS = | |
LOAD 'LogParserSamplePigMR/Data/airawat-syslog/*/*/*' as line; | |
report_DS = MAPREDUCE 'lib/LogEventCount.jar' STORE raw_log_DS INTO 'LogParserSamplePigMR/inputDir' LOAD 'LogParserSamplePigMR/outputDir' AS (process:chararray, count: int) `Airawat.O | |
ozie.Samples.LogEventCount LogParserSamplePigMR/inputDir LogParserSamplePigMR/outputDir`; | |
store report_DS INTO 'LogParserSamplePigMR/reportDir'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Command to run the pig script | |
------------------------------ | |
These should be run after the data, scripts and jars are loaded to HDFS - covered in section 03-HdfsLoadCommands | |
$ cd LogParserSamplePigMR | |
$ pig SysLog-PigMR-Report.pig | |
Command to view output | |
----------------------- | |
$ hadoop fs -cat LogParserSamplePigMR/reportDir/part* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Output | |
------- | |
init: 23 | |
kernel: 58 | |
ntpd_initres[1705]: 792 | |
sudo: 2 | |
udevd[361]: 1 |
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteBig Data Hadoop training in electronic city
good information...
ReplyDeleteOracle Internship
R Programming Internship
CCNA Internship
Networking Internship
Artificial Intelligence Internship
Machine Learning Internship
Blockchain Internship
Sql Server Internship
Iot Internship
Data Science Internship
nice..
ReplyDeleteSelenium Testing Internship
Linux Internship
C Internship
CPP Internship
Embedded System Internship
Matlab Internship
nice blog...
ReplyDeletePython Internship
Dotnet Internship
Java Internship
Web Design Internship
Php Internship
Android Internship
Big Data Internship
Cloud Internship
Hacking Internship
Robotics Internship
ReplyDeleteHow To Hack On Crosh
Request Letter For Air Ticket Booking To HR
Zeus Learning Aptitude Paper For Software Developer
Cimpress Interview Questions
VCB Rating
Appreciation Letter To Vendor
JS MAX Safe Integer
Why Do You Consider Yourself Suitable For The Position
How To Hack Android Phone From PC
About Bangalore Traffic Essay
good .........very useful
ReplyDeletefresher-marketing-resume-sample
front-end-developer-resume-sample
full-stack-developer-resume-samples
fund-accountant-resume-samples
general-ledger-accountant-resume-sample
government-jobs-resume
hadoop-developer-sample-resume
hadoop-developer-sample-resume
hardware-and-networking-resume-samples
hardware-engineer-resume-sample
Really very nyc post..
ReplyDeletecoronavirus update
inplant training in chennai
inplant training
inplant training in chennai for cse
inplant training in chennai for ece
inplant training in chennai for eee
inplant training in chennai for mechanical
internship in chennai
online internships
I believe that your blog will surely help the readers who are really in need of this vital piece of information. Waiting for your updates. i need some more detais.
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai