What is PIG? - Hadoop ~ SsaiK

Monday, March 7, 2016

What is PIG? - Hadoop

Apache PIG is meant for process the data i.e., for data summarization, querying and advanced querying.

Pig has its own scripting language called, pig Latin scripting. Whenever we are executing PIG actions, internally Map Reduce jobs are getting triggered by the framework.

Yahoo introduced PIG, and it is one of the great Apache project now. Pig has built on top of MapReduce.

PIG has BAG is can be ordered or disordered table and each bag has two attributes, called tuples and atoms.

NOTE: PIG will never has Metadata and Warehouse.PIG can work with 3 modes, they are LOCAL, HDFS and EMBEDDED MODE.

LOCAL mode: Input data will be taken from LFS path (Not from HDFS) and once the processing is completed the generated output will also be part of LFS path, means in the Local mode of PIG interacting, there is no intervention of HDFS.

So, here frame will be copy this data into temporary HDFS path and initiate MapReduce jobs and then process the data. Once data has been processed, again framework will copy that data into LFS path.

$> pig -x local
If there is script, use $> pig -x local <<script-name>>

HDFS mode: Input is taken from HDFS, and output is into HDFS.

$> pig
if there is a script, use $> pig <<script-name>>

EMBEDDED mode: if at all we are not able to achieve desired functionality using existing commands or operations or actions we can choose embedded mode to develop customized application.

I.e. UDF's - User Defined Functions

Following are the Default Transformations or operators in PIG:

1) LOAD 2) FOREACH 3) GENERATE

4) FILTER 5) DUMP 6) STORE

7) DESCRIBE 8) SPLIT 9) ORDER BY

10) GROUP BY 11) JOIN 12) UNION

13) CROSS 14) LIMIT 15) TOKENIZE

16) EXPLAIN 17) ILLUSTRATE 18) FLATTEN

19) AGGFUNCTIONS 20) DISTINCT 21) COGROUP

Following Datatypes are used in PIG

DATA TYPES PIG LATIN DATATYPES

Int int

String chararray

float float

long long

boolean boolean

byte (Default) bytearray

TIP: Execution of PIG can be done in 2 flavors, they are grunt shell (i.e. line by line) and script mode (i.e. Group of commands)

Next Lesson: Transformations or Operators in PIG

College Material / Hadoop / PIG

1 Comment

1 comment:

UnknownAugust 4, 2017 at 11:39 PM
The Blog gave us idea about the pig in hadoop my sincere thanks for sharing this post Please Continue to share this post
Hadoop Training in Chennai
ReplyDelete
Replies

Add comment

About

Monday, March 7, 2016

What is PIG? - Hadoop

1 comment:

Popular Posts

Categories

Blog Archive

Blogroll