Friday, 23 August 2013

Pig Multi-Query Optimization issue

Pig Multi-Query Optimization issue

We are running into issues on Pig's Multiquery Optimizer does not work as
expected.
As I understood, the below script should be run as one MR job, but it runs
as two jobs on our cluster. I think the Multiquery Optimization should be
on by default, am I missing anything here? If I replace the group by by
"filter" statement then it works as one single MR job.
data = LOAD 'input' AS (a:chararray, b:int, c:int);
A = GROUP data BY b;
B = GROUP data BY c;
STORE A INTO 'output1';
STORE B INTO 'output2';
I'm using CDH packed pig 0.1.0 and Hadoop 2.0.0.

No comments:

Post a Comment