About

Edit photo

Monday, March 7, 2016

What is PIG? - Hadoop


Apache PIG is meant for process the data i.e., for data summarization, querying and advanced querying.

Pig has its own scripting language called, pig Latin scripting. Whenever we are executing PIG actions, internally Map Reduce jobs are getting triggered by the framework.

Yahoo introduced PIG, and it is one of the great Apache project now. Pig has built on top of MapReduce.

PIG has BAG is can be ordered or disordered table and each bag has two attributes, called tuples and atoms.


NOTE: PIG will never has Metadata and Warehouse.PIG can work with 3 modes, they are LOCAL, HDFS and EMBEDDED MODE.

LOCAL mode: Input data will be taken from LFS path (Not from HDFS) and once the processing is completed the generated output will also be part of LFS path, means in the Local mode of PIG interacting, there is no intervention of HDFS.
So, here frame will be copy this data into temporary HDFS path and initiate MapReduce jobs and then process the data. Once data has been processed, again framework will copy that data into LFS path.
$> pig -x local
If there is script, use $> pig -x local <<script-name>>

HDFS mode: Input is taken from HDFS, and output is into HDFS.
$> pig
if there is a script, use $> pig <<script-name>>

EMBEDDED mode: if at all we are not able to achieve desired functionality using existing commands or operations or actions we can choose embedded mode to develop customized application.
I.e. UDF's - User Defined Functions

Following are the Default Transformations or operators in PIG:

1) LOAD                                             2) FOREACH                                      3) GENERATE
4) FILTER                                           5) DUMP                                           6) STORE
7) DESCRIBE                                      8) SPLIT                                             9) ORDER BY
10) GROUP BY                                 11) JOIN                                            12) UNION
13) CROSS                                         14) LIMIT                                          15) TOKENIZE
16) EXPLAIN                                     17) ILLUSTRATE                               18) FLATTEN
19) AGGFUNCTIONS                       20) DISTINCT                                    21) COGROUP

Following Datatypes are used in PIG

DATA TYPES                       PIG LATIN DATATYPES
Int                                                                      int
   String                                                              chararray
float                                                                   float
long                                                                   long
boolean                                                             boolean
byte                                                 (Default) bytearray

TIP: Execution of PIG can be done in 2 flavors, they are grunt shell (i.e. line by line) and script mode (i.e. Group of commands)

1 comment:

  1. The Blog gave us idea about the pig in hadoop my sincere thanks for sharing this post Please Continue to share this post
    Hadoop Training in Chennai

    ReplyDelete