Hello Preethi,

We hope you are doing well.

Pig resides on user machine and executes job on cluster. Pig LOAD statement just reads data from HDFS and does not store/load data in the user machine.

A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. (This definition applies to all Pig Latin operators except LOAD and STORE which read data from and write data to the file system.) Pig Latin statements can span multiple lines and must end with a semi-colon ( ; ). Pig Latin statements are generally organized in the following manner:

    A LOAD statement reads data from the file system.

    A series of "transformation" statements process the data.

    A STORE statement writes output to the file system; or, a DUMP statement displays output to the screen.


UNION: Computing the union of multiple relations

We can vertically glue together contents of multiple aliases into a single alias by the UNION command. For example,

A = LOAD 'data' AS (a1:int,a2:int,a3:int);

DUMP A;
(1,2,3)
(4,2,1)

B = LOAD 'data' AS (b1:int,b2:int);

DUMP A;
(2,4)
(8,9)
(1,3)

X = UNION A, B;

DUMP X;
(1,2,3)
(4,2,1)
(2,4)
(8,9)
(1,3)


Notes:

UNION is not order-preserving. The inputs are interpreted as unordered bag of tuples and the output union is also an unordered bag.

UNION does not ensure (like in databases) that the tuples all adhere to the same schema, or even that        they have the same number of fields, as in the above example. However, in the typical case, it should be so, and it is the user's responsibility to

        either ensure the same kind of tuples in all aliases being unioned, or
        be able to handle the different kinds of tuples while processing the result of the union.

UNION does not eliminate duplicate tuples.


Fields in a relation can be referenced in two ways, by positional notation or by name (alias)

Positional notation is generated by the system. Positional notation is indicated with the dollar sign ($) and begins with zero (0); for example, $0, $1, $2.

Names are assigned by user using schema (or, in the case of the GROUP operator and some functions, by the system). We can use any name that is not a Pig keyword.

If you have any issue feel free to revert.

We will be glad assisting you.