Tuesday, July 2, 2013

Log parsing in Hadoop -Part 2: Hive

This post includes sample scripts, data and commands to parse a log file in Hive using regex serde.


Related blogs:

Log parsing in Hadoop -Part 1: Java 
Log parsing in Hadoop -Part 2: Hive 
Log parsing in Hadoop -Part 3: Pig 
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines 



6 comments:

  1. good work...!
    can you please share me any tutorial for regex on hive.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi Anagha,
    When i query the table it doesn't showing any data. please help me
    The following steps i followed.
    step:1
    CREATE EXTERNAL TABLE reg_serde(
    month_name STRING,
    day STRING,
    time STRING,
    host STRING,
    event STRING,
    log STRING)
    PARTITIONED BY(year int, month int)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"
    )
    stored as textfile;
    -----------------------------------------------------------------------------------
    step:2 load data into table

    hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages' into table reg_serde;
    Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
    Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/04/messages
    Loading data to table hive_joins.reg_serde
    OK
    Time taken: 0.814 seconds

    hive> load data local inpath '/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages' into table reg_serde;
    Copying data from file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
    Copying file: file:/home/training/data/hive/regserde/LogParserSampleHive/logs/airawat-syslog/2013/05/messages
    Loading data to table hive_joins.reg_serde
    OK
    Time taken: 3.193 seconds
    ------------------------------------------------------------------------------------
    step:3 select statement
    hive> select * from reg_serde;
    OK
    Time taken: 0.13 seconds

    ReplyDelete
  4. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete