Consider a sample file file.txt with format "id";"place";"number" and following data :
"123" ; "india, kerala,calicut"; 26
"456" ; "india, taminadu,chennai"; 27
"789" ; "USA, Virginia,DC"; 28
Requirement: To extract the state names as individual fields
Solution: Here the main fields are terminated by semicolon, whereas the place field has 3 values each separated by commas.
We will consider place as a tuple in this case and use STRSPLIT for place field to separate country, state and city with place field.
Please follow the steps below :
1) load the input with delimiter as ; and 3 main fields
A = load 'file.txt' using PigStorage(';') as (field1:chararray, field2:chararray, field3:int);
2) split the second field using delimiter as comma(,)
B = foreach A generate field1, FLATTEN(STRSPLIT(field2, ',')), field3;
3) If you want to display field1, use $0.
If you want to display state field, use $2 and so on....
C = foreach B generate $2;
dump C; will generate output as shown below: