What is PIG? - Hadoop
Apache PIG is meant for process the data i.e., for data
summarization, querying and advanced querying.
Pig has its own scripting language called, pig Latin scripting.
Whenever we are executing PIG actions, internally Map Reduce jobs are getting
triggered by the framework.
Yahoo introduced PIG, and it is one of the great Apache project
now. Pig has built on top of MapReduce.
PIG has BAG is can be ordered or disordered table and each
bag has two attributes, called tuples and atoms.
NOTE: PIG will never has Metadata and Warehouse.PIG can work with 3 modes, they are LOCAL, HDFS and EMBEDDED MODE.
LOCAL mode: Input data will be
taken from LFS path (Not from HDFS) and once the processing is completed the
generated output will also be part of LFS path, means in the Local mode of PIG interacting,
there is no intervention of HDFS.
So, here frame will be copy this data into temporary HDFS
path and initiate MapReduce jobs and then process the data. Once data has been
processed, again framework will copy that data into LFS path.
$> pig -x local
If there is script, use $> pig -x local <<script-name>>
If there is script, use $> pig -x local <<script-name>>
HDFS mode: Input
is taken from HDFS, and output is into HDFS.
$> pig
if there is a script, use $> pig <<script-name>>
if there is a script, use $> pig <<script-name>>
EMBEDDED mode: if
at all we are not able to achieve desired functionality using existing commands
or operations or actions we can choose embedded mode to develop customized
application.
I.e. UDF's - User
Defined Functions
Following are the Default Transformations or operators in
PIG:
1) LOAD 2)
FOREACH 3)
GENERATE
4) FILTER 5)
DUMP 6)
STORE
7) DESCRIBE 8)
SPLIT 9)
ORDER BY
10) GROUP BY 11)
JOIN 12)
UNION
13) CROSS 14)
LIMIT 15)
TOKENIZE
16) EXPLAIN 17)
ILLUSTRATE 18)
FLATTEN
19) AGGFUNCTIONS 20)
DISTINCT 21)
COGROUP
|
Following Datatypes are used in PIG
DATA TYPES PIG
LATIN DATATYPES
Int int
String chararray
float float
long long
boolean boolean
byte (Default) bytearray
|
TIP: Execution of PIG can be done in 2 flavors, they are grunt shell (i.e.
line by line) and script mode (i.e.
Group of commands)
Next Lesson: Transformations or Operators in PIG
The Blog gave us idea about the pig in hadoop my sincere thanks for sharing this post Please Continue to share this post
ReplyDeleteHadoop Training in Chennai