This post includes sample scripts, data and commands to parse a log file in Hive using regex serde.
Related blogs:
Log parsing in Hadoop -Part 1: Java
Log parsing in Hadoop -Part 2: Hive
Log parsing in Hadoop -Part 3: Pig
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines
Good work. connect on google+
ReplyDeleteThanks Prashant.
ReplyDeletegood work...!
ReplyDeletecan you please share me any tutorial for regex on hive.
This comment has been removed by the author.
ReplyDeleteHi Anagha,
ReplyDeleteWhen i query the table it doesn't showing any data. please help me
The following steps i followed.
step:1
CREATE EXTERNAL TABLE reg_serde(
month_name STRING,
day STRING,
time STRING,
host STRING,
event STRING,
log STRING)
PARTITIONED BY(year int, month int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"
)
stored as textfile;
-----------------------------------------------------------------------------------
step:2 load data into table
hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages' into table reg_serde;
Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
Loading data to table hive_joins.reg_serde
OK
Time taken: 0.814 seconds
hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages' into table reg_serde;
Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
Loading data to table hive_joins.reg_serde
OK
Time taken: 3.193 seconds
------------------------------------------------------------------------------------
step:3 select statement
hive> select * from reg_serde;
OK
Time taken: 0.13 seconds
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Thanks
ReplyDeleteBig Data and Hadoop Online Training