Related blogs:
Log parsing in Hadoop -Part 1: Java
Log parsing in Hadoop -Part 2: Hive
Log parsing in Hadoop -Part 3: Pig
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes hive ql scripts to create an external partitioned table for Syslog | |
generated log files using regex serde; | |
Usecase: Count the number of occurances of processes that got logged, by year, month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data download: 02-DataDownload | |
Data load commands: 03-DataLoadCommands | |
Hive commands: 04-HiveCommands | |
Sample output: 05-SampleOutput |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sample data | |
------------ | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
Structure | |
---------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data download | |
------------- | |
https://groups.google.com/forum/?hl=en#!topic/hadooped/_tj8w_E-MGY | |
Directory structure | |
------------------- | |
LogParserSampleHive | |
logs | |
airawat-syslog | |
2013 | |
04 | |
messages | |
2013 | |
05 | |
messages |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data load commands | |
------------------ | |
a) Load the data | |
$ hadoop fs -mkdir LogParserSampleHive | |
$ hadoop fs -mkdir LogParserSampleHive/logs | |
$ hadoop fs -put LogParserSampleHive/logs/* LogParserSampleHive/logs/ | |
$ hadoop fs -ls -R LogParserSampleHive/ | awk {'print $8'} | |
LogParserSampleHive/logs | |
LogParserSampleHive/logs/airawat-syslog | |
LogParserSampleHive/logs/airawat-syslog/2013 | |
LogParserSampleHive/logs/airawat-syslog/2013/04 | |
LogParserSampleHive/logs/airawat-syslog/2013/04/messages | |
LogParserSampleHive/logs/airawat-syslog/2013/05 | |
LogParserSampleHive/logs/airawat-syslog/2013/05/messages |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hive commands | |
-------------- | |
a) Create external table: | |
hive> CREATE EXTERNAL TABLE LogParserSample( | |
month_name STRING, | |
day STRING, | |
time STRING, | |
host STRING, | |
event STRING, | |
log STRING) | |
PARTITIONED BY(year int, month int) | |
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' | |
WITH SERDEPROPERTIES ( | |
"input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)" | |
) | |
stored as textfile; | |
b) Create partitions and load data: | |
Note: Replace '/user/airawat' with '/user/<your userID>' | |
hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04) | |
location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/'; | |
hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=05) | |
location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/05/'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hive query | |
----------- | |
hive> set hive.cli.print.header=true; | |
hive> add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; --I need this as my environment is not properly configured | |
hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group by year,month,day,event order by event desc,year,month,day; | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Query output | |
------------ | |
year month day event occurrence | |
2013 05 7 udevd[361]: 1 | |
2013 04 23 sudo: 1 | |
2013 05 3 sudo: 1 | |
2013 05 3 ntpd_initres[1705]: 144 | |
2013 05 4 ntpd_initres[1705]: 261 | |
2013 05 5 ntpd_initres[1705]: 264 | |
2013 05 6 ntpd_initres[1705]: 123 | |
2013 05 3 kernel: 5 | |
2013 05 6 kernel: 1 | |
2013 05 7 kernel: 52 | |
2013 05 3 init: 5 | |
2013 05 7 init: 18 |
Good work. connect on google+
ReplyDeleteThanks Prashant.
ReplyDeletegood work...!
ReplyDeletecan you please share me any tutorial for regex on hive.
This comment has been removed by the author.
ReplyDeleteHi Anagha,
ReplyDeleteWhen i query the table it doesn't showing any data. please help me
The following steps i followed.
step:1
CREATE EXTERNAL TABLE reg_serde(
month_name STRING,
day STRING,
time STRING,
host STRING,
event STRING,
log STRING)
PARTITIONED BY(year int, month int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"
)
stored as textfile;
-----------------------------------------------------------------------------------
step:2 load data into table
hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages' into table reg_serde;
Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
Loading data to table hive_joins.reg_serde
OK
Time taken: 0.814 seconds
hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages' into table reg_serde;
Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
Loading data to table hive_joins.reg_serde
OK
Time taken: 3.193 seconds
------------------------------------------------------------------------------------
step:3 select statement
hive> select * from reg_serde;
OK
Time taken: 0.13 seconds
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Thanks
ReplyDeleteBig Data and Hadoop Online Training