Hooked on Hadoop: UDF's Part 2: Custom GenericUDF in Hive (NVL2)

Wednesday, November 13, 2013

UDF's Part 2: Custom GenericUDF in Hive (NVL2)

1.0. What's in this blog?

In my previous blog on creating custom UDFs in Hive, I covered a sample basic UDF. This blog covers generic UDF creation, to mimic the same NVL2 functionality covered in the previous blog. It includes sample data, java code for creating the UDF, expected results, commands to execute and the output.
[hive 0.10]

About UDFs:

UDF stands for User Defined Function. In Hive, there are (a) reusable functions available, as part of core Hive (out of the box) that can be used in Hive queries; They are called UDFs, even though they are not user-defined. And then there are (b) functions that one can create in Java, also called UDFs, and use in Hive queries. The focus of this blog is custom UDFs (b), specifically generic UDFs.

About generic UDF:

UDFs in Hive have are extensions of either UDF or GenericUDF classes. GenericUDFs are more optimal from a performance perspective as they use short circuit evaluation and lazy evaluation, when compared to UDFs that use reflection. GenericUDFs support non-primitive Hive types like arrays, structs and maps in addition to primitive types, unlike UDFs that support only primitive types.

About NVL2:

NVL2 takes three parameters, we will refer to as expr1, expr2 and expr3.

NVL2 lets you determine the value returned by a query based on whether a specified expression is null or not null. If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3.

2.0. NVL2 generic UDF in Hive

1: Create the test data file for a Hive external table

2: Create the Hive table

3: Create the UDF in Java

4: Expected results

5: Try out the UDF

3.0. Making the UDF permanently available when you launch the hive shell

There are several ways to make a custom UDF available when you launch the Hive shell, bypassing the need to execute the "add jar..." statement before using a custom UDF. I have listed a couple of them.

Option 1:
From "Programming Hive"

Your function may also be added permanently to Hive, however this requires a small modification to a Hive Java file and then rebuilding Hive.
Inside the Hive source code, a one-line change is required to the FunctionRegistry class found atql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java. Then you rebuild Hive following the instructions that come with the source distribution.
While it is recommended that you redeploy the entire new build, only the hive-exec-*.jar, where \* is the version number, needs to be replaced.

Option 2:

Add it to the .hiverc file on each node from where hive queries will be run.

Check out my blog - http://hadooped.blogspot.com/2013/08/hive-hiverc-file.html

4.0. References

Apache documentation:
http://hive.apache.org/docs/r0.10.0/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html
https://cwiki.apache.org/confluence/display/Hive/OperatorsAndFunctions

A good article on creating a UDF that involves non-primitive types - link

Programming Hive - from O'Reilly

That's it for this blog. Do share any additional insights with me.

Cheers!

11 comments:

darkcloudJanuary 18, 2014 at 12:15 AM
Great work. Thanks for sharing your work.
ReplyDelete
Replies
Anagha KhanolkarJanuary 18, 2014 at 8:10 AM
Thanks.
ReplyDelete
Replies
JosephMarch 22, 2014 at 2:26 PM
Saved me a full week's work. Where on earth are you based?
ReplyDelete
Replies
Anagha KhanolkarMarch 22, 2014 at 5:45 PM
Glad the post helped you, Joseph.
I am based out of Chicago, IL.

ReplyDelete
Replies
NatesanMarch 28, 2014 at 7:09 AM
thank you so much for the post !!!
ReplyDelete
Replies
NatesanMarch 29, 2014 at 9:11 AM
Hi,

I am trying to execute the same above genric udf and facing "NULL POINTER EXCEPTION"

hive> select * from departments_udftest;
OK
d001 marketing
d002 finance
d003 hr
d004
d005
d006 testing
Time taken: 0.238 seconds, Fetched: 6 row(s)

hive> select deptno,nvl2(deptname,deptname,'test') from departments_udftest;
FAILED: NullPointerException null

Can you please let me know if i am missing something here?
ReplyDelete
Replies
UnknownMarch 16, 2015 at 7:51 AM
I guess you have already figured this out yourself, but for all others who struggle with this as I just did today: the line 52 in the java code (returnOIResolver = new GenericUDF...) should be moved before the third check. Otherwise you use object returnOIResolver before it is initiated.
ReplyDelete
Replies
UnknownOctober 26, 2017 at 2:56 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJuly 8, 2018 at 5:11 AM
thakyou it vry nice blog for beginners
https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
ReplyDelete
Replies
ReethuNovember 23, 2019 at 9:48 AM
Good post. Keep sharing. You can the training from,
Machine Learning training in Pallikranai Chennai
Pytorch training in Pallikaranai chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Deep learning with Pytorch training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai
Mongodb Nosql training in Pallikaranai chennai
Spark with ML training in Pallikaranai chennai
Data science Python training in Pallikaranai
Bigdata Spark training in Pallikaranai chennai
Sql for data science training in Pallikaranai chennai
Sql for data analytics training in Pallikaranai chennai
Sql with ML training in Pallikaranai chennai
ReplyDelete
Replies
UnknownFebruary 23, 2020 at 9:34 AM
As reported by Stanford Medical, It is indeed the ONLY reason women in this country live 10 years more and weigh an average of 19 kilos less than we do.

(Just so you know, it is not related to genetics or some secret exercise and EVERYTHING to around "how" they are eating.)

BTW, I said "HOW", and not "WHAT"...

Click this link to discover if this quick quiz can help you release your real weight loss possibility
ReplyDelete
Replies

Add comment

Hooked on Hadoop

Wednesday, November 13, 2013

UDF's Part 2: Custom GenericUDF in Hive (NVL2)

1.0. What's in this blog?

2.0. NVL2 generic UDF in Hive

11 comments:

Search

Blog archive

Popular Posts

Total Pageviews