Related blogs:
Log parsing in Hadoop -Part 1: Java - using regex
Log parsing in Hadoop -Part 2: Hive - using regex
Log parsing in Hadoop -Part 3: Pig - using regex
Log parsing in Hadoop -Part 4: Python - using regex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript | |
Pig script execution command: 05-PigLatinScriptExecution | |
Output: 06-Output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sample data | |
------------ | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
Structure | |
---------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data and Script download | |
------------------------ | |
https://groups.google.com/forum/?hl=en#!topic/hadooped/Wix8ZznQGJU | |
Directory structure | |
------------------- | |
LogParserSamplePig | |
Data | |
airawat-syslog | |
2013 | |
04 | |
messages | |
2013 | |
05 | |
messages | |
SysLog-Pig-Report.pig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hdfs load commands | |
------------------- | |
$ hadoop fs -put LogParserSamplePig/ | |
Validate load | |
------------- | |
$ hadoop fs -ls -R LogParserSamplePig | awk '{print $8}' | |
Expected directory structure | |
----------------------------- | |
LogParserSamplePig/Data | |
LogParserSamplePig/Data/airawat-syslog | |
LogParserSamplePig/Data/airawat-syslog/2013 | |
LogParserSamplePig/Data/airawat-syslog/2013/04 | |
LogParserSamplePig/Data/airawat-syslog/2013/04/messages | |
LogParserSamplePig/Data/airawat-syslog/2013/05 | |
LogParserSamplePig/Data/airawat-syslog/2013/05/messages | |
LogParserSamplePig/SysLog-Pig-Report.pig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Pig Latin script - SysLog-Pig-Report.pig | |
rmf LogParserSamplePig/output | |
raw_log_DS = | |
-- load the logs into a sequence of one element tuples | |
LOAD 'LogParserSamplePig/Data/airawat-syslog/*/*/*' AS line; | |
parsed_log_DS = | |
-- for each line/log parse the same into a | |
-- structure with named fields | |
FOREACH raw_log_DS | |
GENERATE | |
FLATTEN ( | |
REGEX_EXTRACT_ALL( | |
line, | |
'(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)' | |
) | |
) | |
AS ( | |
month_name: chararray, | |
day: chararray, | |
time: chararray, | |
host: chararray, | |
process: chararray, | |
log: chararray | |
); | |
report_draft_DS = | |
--Generate dataset containing just the data needed | |
FOREACH parsed_log_DS GENERATE month_name,process; | |
grouped_report_DS = | |
--Group the dataset | |
GROUP report_draft_DS BY (month_name,process); | |
aggregate_report_DS = | |
--Compute count | |
FOREACH grouped_report_DS { | |
GENERATE group.month_name,group.process,COUNT(report_draft_DS) AS frequency; | |
} | |
sorted_DS = | |
ORDER aggregate_report_DS by $0,$1; | |
STORE sorted_DS INTO 'LogParserSamplePig/output/SortedResults'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Execute the pig script on the cluster | |
-------------------------------------- | |
$ pig SysLog-Pig-Report.pig | |
View output | |
----------- | |
$ hadoop fs -cat LogParserSamplePig/output/SortedResults/part* | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Output | |
------- | |
Apr sudo: 1 | |
May init: 23 | |
May kernel: 58 | |
May ntpd_initres[1705]: 792 | |
May sudo: 1 | |
May udevd[361]: 1 |
useful blog to all... after reading this blog i am very clear in this topic..
ReplyDeletehadoop training institute in adyar | big data training institute in adyar | hadoop training in chennai adyar | big data training in chennai adyar
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteBig Data Hadoop training in electronic city
ReplyDeleteVery Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
This comment has been removed by the author.
ReplyDeleteExcellent content ,Thanks for sharing this .,
ReplyDeleteMust have Skills for online teachers
Free teaching tools