Advanced scripts - PIG Part 3
FILTER:
filter = FILTER climate by country MATCHES 'C.*a'; -> china
filter = FILTER climate by country MATCHES '.*(nda|hin).*; -> india, china
SPLIT:
SPLIT climate into B1992 if $1 == 1992, B2002 if $1 == 2002;
it split the climate bag into two records B1992 and B2002, contains 1992 and 2002 year data simuntaneously.
DUMP B1992;
DUMP B2002;
SAMPLE:
it generates sample data (random data) from the bag.
sample = SAMPLE climate 0.1; -> it generates 1% of random data from climate.
ORDER:
order = ORDER climate BY year asc;
GROUP BY:
group = GROUP climate BY year;
group = GROUP climate BY (year, temp);
COGROUP:
it will combine the similar tuples into one group from 2 or more relations based on the grouping column.
cogroup = COGROUP a by a1, b by b1;
dump cogroup;
a1 a2 a3 b1 b2
1, 2, 3 4, 5
3, 5, 3 1, 3
2, 2, 7 5, 5
1, 4, 3 2, 5
2, 2, 3 1, 8
(1,3,2,1,2,4,1,5,2,1) --> (1,3,2,4,5)
(1,{(1,2,3),(1,4,3)},{(1,3),(1,8)})
(3,{(3,5,3)},{})
(2,{(2,2,7),(2,2,3)},{(2,5)})
(4,{},{(4,5)})
(5,{},{(5,5)})
cogroup_a_inner = COGROUP a BY a1 inner, b BY b1; --> atleast one tuple should be available in a.
(1,{(1,2,3),(1,4,3)},{(1,3),(1,8)})
(3,{(3,5,3)},{})
(2,{(2,2,7),(2,2,3)},{(2,5)})
cogroup_b_inner = COGROUP a BY a1, b BY b1 inner; --> atleast one tuple should be available in b.
(1,{(1,2,3),(1,4,3)},{(1,3),(1,8)})
(2,{(2,2,7),(2,2,3)},{(2,5)})
(4,{},{(4,5)})
(5,{},{(5,5)})
cogroup_a_b_inner = COGROUP a BY a1 inner, b BY b1 inner;
(1,{(1,2,3),(1,4,3)},{(1,3),(1,8)})
(2,{(2,2,7),(2,2,3)},{(2,5)})
TUPLE:
store following data as tuple.
(1,2,3) (4,5,6)
(2,3,4) (5,6,7)
tuple = LOAD 'abc.txt' USING PigStorage(' ') AS (t1:tuple(t1a:int,t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
BAG:
Store following data as Bag
{(1,2,3),(1,4,3)} {(1,3),(1,8)})
{(3,5,3)} {(3,5)})
{(2,2,7),(2,2,3)} {(2,5)})
bag1 = load 'data.txt' USING PigStorage(' ') AS (B1:Bag{t1:tuple(t1a:int,t1b:int,t1c:int)}, B2:Bag{t2:tuple(t2a:int,t2b:int)});
Introduction to PIG - Click here
Transformations or Operators in PIG - Click here
0 comments:
Post a Comment