Tuesday, December 31, 2013

Log parsing in Hadoop - Part 5: Cascading

1.0. What's in this post?

This post is a part of a series, focussed on log parsing in Java Mapreduce, Pig, Hive, Python...This one covers a simple log parser in Cascading, and includes a sample program, data and commands.

Documentation on Cascading:
http://www.cascading.org/documentation/

Other related blogs:
Log parsing in Hadoop -Part 1: Java 
Log parsing in Hadoop -Part 2: Hive 
Log parsing in Hadoop -Part 3: Pig 
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines 


2.0. Sample program


2.0.1. What the program does..
a) It reads syslog generated logs stored in HDFS
b) Regex parses them 
c) Writes successfully parsed records to files in HDFS
d) Writes records that dont match the pattern to HDFS
e) Writes a report to HDFS that contains the count of distinct processes logged.

2.0.2. Sample log data


2.0.3. Directory structure of log files


2.0.4. Log parser in Cascading


2.0.5. build.gradle file
Gradle documentation is available at- http://www.gradle.org
Here is the build.gradle...

2.0.6. Data and code download 




2.0.7. Commands (load data, execute program)


2.0.8. Results 






14 comments:

  1. The information which you provides is very much useful for the Hadoop Learners. Thank you for your valuable information. I
    found Hadoop Training in hyderabad is the best Hadoop Training institute in Hyderabad, India .

    ReplyDelete
  2. Well Said. The content provided is true up to my knowledge. This made me to understand the concepts very clear. Thanks for sharing this wonderful information in here. Keep blogging article like this. I have bookmarked this page for future reference as well.


    Hadoop Training Chennai | Big Data Training | JAVA Course in Chennai

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. I just want to say I’m new to weblog and certainly savored this page. You actually have outstanding well written articles. Cheers for sharing with us your website.

    Hadoop Training in Chennai

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Log parsing is very hard task in Hadoop, thank for explaining it easily. I am taking big data training in Hyderabad from Lucidtechsystems. This post helps me in my training. Thank you.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete
  9. Thanks for sharing this.,
    Leanpitch provides online training in Scrum Master, everyone can use it wisely.
    Join Leanpitch 2 Days CSM Certification Workshop in different cities.

    Scrum master certification
    csm certification

    ReplyDelete