Yo, the author here. Thanks for the feedback. I *totally* meant idempotency, dra...

smhinsey · on July 2, 2009

No problem. I am not too familiar with Hadoop, but those speculative reduce tasks sound like a real blast to debug.

I can see why the approach in your blog would have a lot appeal in that environment. It sounds like some sort of error flagging in combination with a set of heuristics around what failed, how often, what time of day, etc would be the way to go.

I find that intelligent monitoring systems like that are ultimately necessary in systems like this anyway, you just usually end up discovering that the hard way (I know I have, several times. It's one of those lessons you are tempted to unlearn in the interests of expediency). Does Hadoop help you out with that sort of thing?