Tuesday, July 2, 2013

Log parsing in Hadoop -Part 2: Hive

This post includes sample scripts, data and commands to parse a log file in Hive using regex serde.


Related blogs:

Log parsing in Hadoop -Part 1: Java 
Log parsing in Hadoop -Part 2: Hive 
Log parsing in Hadoop -Part 3: Pig 
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines 



8 comments:

  1. good work...!
    can you please share me any tutorial for regex on hive.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi Anagha,
    When i query the table it doesn't showing any data. please help me
    The following steps i followed.
    step:1
    CREATE EXTERNAL TABLE reg_serde(
    month_name STRING,
    day STRING,
    time STRING,
    host STRING,
    event STRING,
    log STRING)
    PARTITIONED BY(year int, month int)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"
    )
    stored as textfile;
    -----------------------------------------------------------------------------------
    step:2 load data into table

    hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages' into table reg_serde;
    Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
    Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
    Loading data to table hive_joins.reg_serde
    OK
    Time taken: 0.814 seconds

    hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages' into table reg_serde;
    Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
    Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
    Loading data to table hive_joins.reg_serde
    OK
    Time taken: 3.193 seconds
    ------------------------------------------------------------------------------------
    step:3 select statement
    hive> select * from reg_serde;
    OK
    Time taken: 0.13 seconds

    ReplyDelete
  4. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete
  5. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete