Hooked on Hadoop: Log parsing in Hadoop - Part 5: Cascading

Tuesday, December 31, 2013

Log parsing in Hadoop - Part 5: Cascading

1.0. What's in this post?

This post is a part of a series, focussed on log parsing in Java Mapreduce, Pig, Hive, Python...This one covers a simple log parser in Cascading, and includes a sample program, data and commands.

Documentation on Cascading:
http://www.cascading.org/documentation/

Other related blogs:
Log parsing in Hadoop -Part 1: Java
Log parsing in Hadoop -Part 2: Hive
Log parsing in Hadoop -Part 3: Pig
Log parsing in Hadoop -Part 4: Python
Log parsing in Hadoop -Part 5: Cascading
Log parsing in Hadoop -Part 6: Morphlines

2.0. Sample program

2.0.1. What the program does..
a) It reads syslog generated logs stored in HDFS
b) Regex parses them
c) Writes successfully parsed records to files in HDFS
d) Writes records that dont match the pattern to HDFS
e) Writes a report to HDFS that contains the count of distinct processes logged.

2.0.2. Sample log data

2.0.3. Directory structure of log files

2.0.4. Log parser in Cascading

2.0.5. build.gradle file
Gradle documentation is available at- http://www.gradle.org
Here is the build.gradle...

2.0.6. Data and code download

2.0.7. Commands (load data, execute program)

2.0.8. Results

14 comments:

mareddyonlineJuly 19, 2014 at 1:27 PM
The information which you provides is very much useful for the Hadoop Learners. Thank you for your valuable information. I
found Hadoop Training in hyderabad is the best Hadoop Training institute in Hyderabad, India .
ReplyDelete
Replies
UnknownDecember 27, 2015 at 11:13 PM
Well Said. The content provided is true up to my knowledge. This made me to understand the concepts very clear. Thanks for sharing this wonderful information in here. Keep blogging article like this. I have bookmarked this page for future reference as well.

Hadoop Training Chennai | Big Data Training | JAVA Course in Chennai
ReplyDelete
Replies
UnknownNovember 1, 2016 at 11:46 PM
This comment has been removed by the author.
ReplyDelete
Replies
sathyaNovember 2, 2016 at 1:57 AM
I just want to say I’m new to weblog and certainly savored this page. You actually have outstanding well written articles. Cheers for sharing with us your website.

Hadoop Training in Chennai
ReplyDelete
Replies
Ajay RajFebruary 16, 2017 at 12:02 PM
This comment has been removed by the author.
ReplyDelete
Replies
nagendraSeptember 15, 2017 at 12:07 AM
Log parsing is very hard task in Hadoop, thank for explaining it easily. I am taking big data training in Hyderabad from Lucidtechsystems. This post helps me in my training. Thank you.
ReplyDelete
Replies
AnonymousNovember 14, 2017 at 9:31 PM
Good Post

Legal advisor in Chennai
ReplyDelete
Replies
Ancy merinaFebruary 22, 2018 at 10:52 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJuly 8, 2018 at 5:11 AM
thakyou it vry nice blog for beginners
https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
ReplyDelete
Replies
TBWBJune 24, 2019 at 3:40 AM
Find List of sleepwell mattress shop in sector 14 gurgaon city of Haryana
ReplyDelete
Replies
raveenaMarch 7, 2020 at 9:03 PM
Best blog ever.
Big Data and Hadoop Online Training
ReplyDelete
Replies
veera cynixitJuly 23, 2020 at 1:42 AM
very nice blog,keep sharing more blogs with us.

hadoop admin certification

big data online course
ReplyDelete
Replies
jeetJuly 27, 2021 at 9:21 PM
Hi,
Great Post.
sales tracking software free
Sales Tracking System
Sales Tracking Excel
Sales Tracking Software for Field Sales Teams
open source sales tracking software
Best Sales Tracking Software For Small Business
ReplyDelete
Replies
anjani02August 18, 2025 at 11:54 PM
Salesforce CPQ Training Online helps you master configure, price, and quote automation. It’s designed for sales professionals and admins aiming to streamline processes and boost efficiency.
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)