Consider a sample file file.txt with format "id";"place";"number" and following data :


"123" ; "india, kerala,calicut"; 26

"456" ; "india, taminadu,chennai"; 27

"789" ; "USA, Virginia,DC"; 28


Requirement: To extract the state names as individual fields


Solution: Here the main fields are terminated by semicolon, whereas the place field has 3 values each separated by commas.

We will consider place as a tuple in this case and use STRSPLIT for place field to separate country, state and city with place field.

Please follow the steps below :


1) load the input with delimiter as ; and 3 main fields

A = load 'file.txt' using PigStorage(';') as (field1:chararray, field2:chararray, field3:int);


2) split the second field using delimiter as comma(,)

B = foreach A generate field1, FLATTEN(STRSPLIT(field2, ',')), field3; 


3) If you want to display field1, use $0. 

If you want to display state field, use $2 and so on....

C = foreach B generate $2;

dump C;   will generate output as shown below:

kerala

tamilnadu

Virginia