Dear Learner,

Hope you are doing well.

Please have a look on the below details for your reference:

Convert "text" file into "avro" format:

1)A text file in pipe delimited format located at /user/hduser/pig_input/abc.dat

1|8|123|985|659856|10000000002546
1|8|123|985|659856|10000000002546
1|8|123|985|659856|10000000002546
1|8|123|985|659856|10000000002546
1|8|123|985|659856|10000000002546

2) The Schema file is located at hdfs /user/hduser/pig_schema_files/abc.avsc

{
  "type" : "record",
  "name" : "import_dummy",
  "doc" : "import_123dummy",
  "fields" : [ {
  "name" : "ID",
  "type" : [ "string", "null" ],
  "columnName" : "ID",
  "sqlType" : "3"
  }, {
  "name" : "TRANS_O",
  "type" : [ "string", "null" ],
  "columnName" : "TRANS_O",
  "sqlType" : "3"
 }, {
 "name" : "CARD_O",
 "type" : [ "string", "null" ],
 "columnName" : "CARD_O",
 "sqlType" : "3"
 }, {
 "name" : "SEQ_O",
 "type" : [ "string", "null" ],
 "columnName" : "SEQ_O",
 "sqlType" : "1"
 }, {
 "name" : "DATE_O",
 "type" : [ "string", "null" ],
 "columnName" : "DATE_O",
 "sqlType" : "3"
 }],"tableName" : "123dummy"}

3) Pig Script :

REGISTER /app/cloudera/parcels/CDH/lib/pig/piggybank.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/avro-1.3.7.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/jackson-core-asl.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/jackson-mapper-asl.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/json-simple.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/snappy-java.jar

textfile = load 'user/hduser/pig_input/abc.dat' using pigStorage('|') as (ID, TRANS_O,CARD_O,SEQ_O, DATE_O );
STORE textfile INTO '/user/hduser/pig_output/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

you can also refer below links for the same



Hope it resolves your query.

If you have any further query,please let us know.