Monday, November 4, 2013

UDF's Part 1: Custom simple eval UDFs in Pig and Hive (NVL2)


1.0. What's in this blog?


A demonstration of creating a custom simple eval UDF to mimic NVL2 functionality from the DBMS world, in Pig and Hive.  It includes sample data, java code for creating the UDF, expected results, commands to execute and the output.

About NVL2:
NVL2 takes three parameters, we will refer to as expr1, expr2 and expr3.
NVL2 lets you determine the value returned by a query based on whether a specified expression is null or not null. If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3.

2.0. NVL2 UDF in Hive


1: Create the test data file for a Hive external table

2: Create the Hive table

3: Create the UDF in Java

4: Expected results

5: Test the UDF

3.0. NVL2 UDF in Pig


We will reuse data from section 2.
1: Create the UDF in Java

2: Create the pig script

3: Test the UDF
[Modify path of the data file between local and HDFS locations in the pig script - better - make it parameterized]

4: Results



Do share any additional insights/comments.
Cheers!

Follow me on Twitter:

6 comments:

  1. Really a good piece of knowledge on Big Data and Hadoop. Thanks for such a good post. I would like to recommend one more resource NPN Training which helps in getting more knowledge on Hadoop. The best part of NPN Training is they provide complete Hands-on classes.
    http://npntraining.com/courses/big-data-and-hadoop.php

    ReplyDelete
  2. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete