Tuesday, December 31, 2013

Log parsing in Hadoop - Part 5: Cascading

1.0. What's in this post?

This post is a part of a series, focussed on log parsing in Java Mapreduce, Pig, Hive, Python...This one covers a simple log parser in Cascading, and includes a sample program, data and commands.

Documentation on Cascading:

Other related blogs:
Log parsing in Hadoop -Part 1: Java 
Log parsing in Hadoop -Part 2: Hive 
Log parsing in Hadoop -Part 3: Pig 
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines 

2.0. Sample program

2.0.1. What the program does..
a) It reads syslog generated logs stored in HDFS
b) Regex parses them 
c) Writes successfully parsed records to files in HDFS
d) Writes records that dont match the pattern to HDFS
e) Writes a report to HDFS that contains the count of distinct processes logged.

2.0.2. Sample log data

2.0.3. Directory structure of log files

2.0.4. Log parser in Cascading

2.0.5. build.gradle file
Gradle documentation is available at- http://www.gradle.org
Here is the build.gradle...

2.0.6. Data and code download 

2.0.7. Commands (load data, execute program)

2.0.8. Results 


  1. The information which you provides is very much useful for the Hadoop Learners. Thank you for your valuable information. I
    found Hadoop Training in hyderabad is the best Hadoop Training institute in Hyderabad, India .

  2. Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.

    Hadoop Training Chennai
    Hadoop Training in Chennai
    Big Data Training in Chennai

  3. Well Said. The content provided is true up to my knowledge. This made me to understand the concepts very clear. Thanks for sharing this wonderful information in here. Keep blogging article like this. I have bookmarked this page for future reference as well.

    Hadoop Training Chennai | Big Data Training | JAVA Course in Chennai

  4. This comment has been removed by the author.

  5. I just want to say I’m new to weblog and certainly savored this page. You actually have outstanding well written articles. Cheers for sharing with us your website.

    Hadoop Training in Chennai

  6. This comment has been removed by the author.

  7. Really Good blog post.provided a helpful information.I hope that you will post more updates like this.
    Digital marketing company in Chennai

  8. Log parsing is very hard task in Hadoop, thank for explaining it easily. I am taking big data training in Hyderabad from Lucidtechsystems. This post helps me in my training. Thank you.

  9. awesome post presented by you..your writing style is fabulous and keep update with your blogs Hadoop Admin Online Course Bangalore

  10. thakyou it vry nice blog for beginners