1.0. What's in this blog?
A demonstration of creating a custom simple eval UDF to mimic NVL2 functionality from the DBMS world, in Pig and Hive. It includes sample data, java code for creating the UDF, expected results, commands to execute and the output.
About NVL2:
NVL2 takes three parameters, we will refer to as expr1, expr2 and expr3.
NVL2 lets you determine the value returned by a query based on whether a specified expression is null or not null. If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3.
2.0. NVL2 UDF in Hive
1: Create the test data file for a Hive external table
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************** | |
Input data | |
************************** | |
1. Execute locally on the node you are running Hive client from | |
----------------------------------------------------------------- | |
Create input file/data to use for the demo. | |
Since this gist is merely for instructional purpose, the dataset is small. | |
cd ~ | |
mkdir hiveProject | |
cd hiveProject | |
vi Departments_UDFTest | |
Paste this..ensuring the fields are delimited by tabs and record with new line. | |
d001 Marketing | |
d002 Finance | |
d003 Human Resources | |
d004 Production | |
d005 Development | |
d006 Quality Management | |
d007 Sales | |
d008 | |
d009 Customer Service | |
2. Hadoop commands | |
-------------------- | |
hadoop fs -mkdir hiveProject | |
hadoop fs -put hiveProject/Departments_UDFTest hiveProject | |
2: Create the Hive table
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************************************* | |
Setting up the Hive table | |
********************************************* | |
In hive shell.... | |
a) Create table: | |
CREATE EXTERNAL TABLE IF NOT EXISTS departments_UDFTest | |
( | |
deptNo String, | |
deptName String | |
) | |
Row format delimited | |
fields terminated by '\t' | |
LOCATION '/user/akhanolk/hiveProject'; | |
b) Quick test: | |
Select * from departments_UDFTest; | |
d001 Marketing | |
d002 Finance | |
d003 Human Resources | |
d004 Production | |
d005 Development | |
d006 Quality Management | |
d007 Sales | |
d008 NULL | |
d009 Customer Service | |
3: Create the UDF in Java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//------------------------------------------------------------------ | |
// Filename: NVL2.java | |
//------------------------------------------------------------------ | |
package khanolkar.HiveUDFs; | |
import java.io.IOException; | |
import org.apache.hadoop.hive.ql.exec.UDF; | |
public class NVL2 extends UDF { | |
String expr1, expr2, expr3; | |
public NVL2() { | |
} | |
public String evaluate(String pExpr1, String pExpr2, String pExpr3) | |
throws IOException { | |
try { | |
expr1 = (String) pExpr1; | |
expr2 = (String) pExpr2; | |
expr3 = (String) pExpr3; | |
return (expr1 != null ? expr2 : expr3); | |
} catch (Exception e) { | |
// Cause task failure | |
throw new IOException("Error with Hive UDF, NVL2!", e); | |
} | |
} | |
} |
4: Expected results
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**************************** | |
Expected results | |
**************************** | |
Query: | |
select deptNo,NVL2(deptName,deptName,'Procrastrination') from departments_UDFTest; | |
The null in the department name for department d008, should be returned as "Procrastrination". | |
For the rest of the records, the query should return the data in Hive, as is. | |
5: Test the UDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
******************************* | |
Testing the UDF | |
******************************* | |
hive> add jar hiveProject/jars/NVL2.jar; | |
hive> CREATE TEMPORARY FUNCTION NVL2 | |
AS 'khanolkar.HiveUDFs.NVL2'; | |
hive> select deptNo,NVL2(deptName,deptName) from departments_UDFTest; | |
FAILED: SemanticException [Error 10014]: Line 1:14 Wrong arguments 'deptName': No matching method for class khanolkar.HiveUDFs.NVL2 with (string, string). Possible choices: _FUNC_(string, string, string) | |
hive> select deptNo,NVL2(deptName,deptName,'Procrastrination') from departments_UDFTest; | |
OK | |
d001 Marketing | |
d002 Finance | |
d003 Human Resources | |
d004 Production | |
d005 Development | |
d006 Quality Management | |
d007 Sales | |
d008 Procrastrination | |
d009 Customer Service |
3.0. NVL2 UDF in Pig
We will reuse data from section 2.
1: Create the UDF in Java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package khanolkar.pigUDFs; | |
import java.io.IOException; | |
import org.apache.pig.EvalFunc; | |
import org.apache.pig.data.Tuple; | |
// Custom UDF | |
// Name: NVL2 | |
// Parameters: Tuple with three Strings | |
// Purpose: Facilitates handling nulls + replacing non-null values | |
// If the first parameter is null, returns the third parameter, | |
// otherwise returns the second parameter | |
// E.g. NVL2(null,"Busy bee","Sloth") = "Sloth" | |
// E.g. NVL2("Anagha","Busy bee","Sloth") = "Busy bee" | |
// Returns: Null if tuple is empty | |
// Null if the three input parameters are not in the tuple | |
// Otherwise, Result of applying NVL2 logic | |
public class NVL2 extends EvalFunc<String> { | |
public String exec(Tuple input) throws IOException { | |
if (input == null || input.size() == 0) | |
return null; | |
try { | |
if (input.size() == 3) { | |
String expr1 = (String) input.get(0); | |
String expr2 = (String) input.get(1); | |
String expr3 = (String) input.get(2); | |
return (expr1 != null ? expr2 : expr3); | |
} else { | |
return null; | |
} | |
} catch (Exception e) { | |
// Cause task failure | |
throw new IOException("Error with UDF, NVL2!", e); | |
} | |
} | |
} |
2: Create the pig script
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-------------------------------------------------------------------------------------- | |
# Pig Script | |
# NVL2UDFDemo.pig | |
#-------------------------------------------------------------------------------------- | |
register NVL2.jar; | |
define NVL2 khanolkar.pigUDFs.NVL2; | |
rawDS = load 'departments' using PigStorage() as (deptNo:chararray, deptName:chararray); | |
transformedDS = foreach rawDS generate $0, NVL2($1,$1,'Procrastination'); | |
dump transformedDS; |
3: Test the UDF
[Modify path of the data file between local and HDFS locations in the pig script - better - make it parameterized]
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Command to test | |
#--------------------------- | |
On the cluster | |
$ pig pigProject/evalFunc/NVL2/NVL2UDFDemo.pig | |
Locally | |
$ pig -x local pigProject/evalFunc/NVL2/NVL2UDFDemo.pig |
4: Results
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Output data | |
#--------------------------- | |
(d001,Marketing) | |
(d002,Finance) | |
(d003,Human Resources) | |
(d004,Production) | |
(d005,Development) | |
(d006,Quality Management) | |
(d007,Sales) | |
(d008,Procrastination) | |
(d009,Customer Service) |
Do share any additional insights/comments.
Cheers!
Follow me on Twitter:
Cheers!
Follow me on Twitter:
Really a good piece of knowledge on Big Data and Hadoop. Thanks for such a good post. I would like to recommend one more resource NPN Training which helps in getting more knowledge on Hadoop. The best part of NPN Training is they provide complete Hands-on classes.
ReplyDeletehttp://npntraining.com/courses/big-data-and-hadoop.php
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletePython Training in electronic city
DataScience with Python Training in electronic city
AWS Training in electronic city
Big Data Hadoop Training in electronic city
Devops Training in electronic city
list of play school in noida up
ReplyDeleteList of wardrobe shop in gurgaon haryana
ReplyDeleteGood Post..
ReplyDeleteSAP Training in Chennai
Java Training in Chennai
CCNA Training in Chennai
Pearson Vue Exam Center in Chennai
QTP Training in Chennai
Selenium Training in Chennai
Hardware and Networking Training in Chennai
SAP ABAP Training in Chennai
SAP FICO Training in Chennai
AWS Training in Chennai