What's in this blog?
A sample map-reduce program in Java that joins two datasets, on the map-side - an employee dataset and a department dataset, with the department number as join key. The department dataset is a very small dataset in MapFile format, is in HDFS, and is added to the distributed cache. The MapFile is referenced in the map method of the mapper to look up the department name, and emit the employee dataset with department name included.
Apache documentation on DistributedCache:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
Related blogs:
1. Map-side join sample in Java using reference data (text file) from distributed cache - Part 1
2. Map-side join sample in Java using reference data (MapFile) from distributed cache - Part 2
Data used in this blog:
http://dev.mysql.com/doc/employee/en.index.html
Pig and Hive for joins:
Pig and Hive have join capabilities built-in, and are optimized for the same. Programs with joins written in java are more performant, but time-consuming to code, test and support - and in some companies considered an anti-pattern for joins.
Related blogs:
1. Map-side join sample in Java using reference data (text file) from distributed cache - Part 1
2. Map-side join sample in Java using reference data (MapFile) from distributed cache - Part 2
Data used in this blog:
http://dev.mysql.com/doc/employee/en.index.html
Pig and Hive for joins:
Pig and Hive have join capabilities built-in, and are optimized for the same. Programs with joins written in java are more performant, but time-consuming to code, test and support - and in some companies considered an anti-pattern for joins.
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Pretty blog, so many ideas in a single site, thanks for the informative article, keep updating more article.
ReplyDeleteivanka hot