Official Google Blog: Sorting 1PB with MapReduce

jwilliams · on Nov 22, 2008

If my math is right:

(68sec * 1024) / 60 / 60 / 4 --- (68 sec for 1TB) (1024 to make 1TB) /60/60 (hours) and /4 as there are 4x as many computers....

So ideally taking the 1TB sort to 1PB on 4x the hardware would be 4.83hrs.

Google's 1PB sort was in 6 hours - So that's fairly linear. Impressive.

lutorm · on Nov 22, 2008

Isn't the mapreduce sort NlogN in the number of entries? That would make 1e13 log 1e13/(1e10 log 1e10) 68s/4 = 6.1h. So they were actually faster than the possible scaling? Doesn't sound right...

jwilliams · on Nov 23, 2008

This is a really good point.

I was assuming that the sort was actually a partial sort - returning the top "x" (admittedly - even then isn't linear). However, I was also assuming (n << x), which would almost certainly be the case with a large data set. It's a fairly common approach, but not sure why I automatically assumed it.

If the sort is complete, then yeah, that changes things.

russell · on Nov 22, 2008

Sounds dead on to me. The typical sort is O(N logN). O can be slightly different between the 1T runs and 1P runs, e.g. slightly difference latency or pattern of disk failures. Every sort is sensitive to the data distribution. There is probably an interesting distribution of results across the runs.

bd · on Nov 22, 2008

They mention that they did some extra optimizations ("stragglers") for petabyte test (that were not there yet for the terabyte test).

jmtulloss · on Nov 22, 2008

The article implied they were using hard disk sizes. IE. 1 Petabyte is 1000 Terabytes. Not that it makes that much of a difference.

nebula · on Nov 22, 2008

We were writing it to 48,000 hard drives (we did not use the full capacity of these disks, though), and every time we ran our sort, at least one of our disks managed to break (this is not surprising at all given the duration of the test, the number of disks involved, and the expected lifetime of hard disks).

I'm a bit at lost here. Does that mean that one in ~50K hard disks fail within six hours of usage?

russell · on Nov 22, 2008

Not at all surprising given the large number of disks involved. Here is a very interesting analysis of actual Google failure rates: http://storagemojo.com/2007/02/19/googles-disk-failure-exper...

nebula · on Nov 22, 2008

Thanks for the pointer; it's interesting.

wmf · on Nov 22, 2008

1 failure per 50K in 6 hours is the same as 1 per 34 in a year, or a 3% annual failure rate. I think this is the same number Google reported in their disk failure paper.

nebula · on Nov 22, 2008

Actually, I did this math before I posted my comment. What I don't understand is this:

With x% annual failure rate, are the failures uniformly distributed over time? I would expect that it's probably a normal distribution over time. (i.e., no failures to very few failures in the beginning, and the failures keep increasing with time).

bd · on Nov 23, 2008

It's probable that they didn't use 48,000 brand new hard drives, but reused whatever they already had in their datacenters. Which would have a mixture of different drive ages, I guess.

nebula · on Nov 23, 2008

That's an interesting possibility.

But the fact that the article specifically mentions about drive failures and that they use some of RAID to cope with this makes me wonder if they were using old disks. It's probably easier to get all new drives and use them isn't it? On second thoughts though, it might be much easier to tell GFS to make an extra mirror than to make sure that all disks are brand new. So, i buy your line of thinking :)

slackerIII · on Nov 22, 2008

Heh... I guess this is what Yahoo gets for talking smack about how quickly they could sort 1TB: http://developer.yahoo.net/blogs/hadoop/2008/07/apache_hadoo...

sh1mmer · on Nov 22, 2008

Google have been working on Map Reduce for 4 years longer than anyone else.

I'm actually pretty happy that we (Yahoo!) are talking about what we are doing in the Open and contributing in an Open project.

You could reproduce what we did. Obviously you can't reproduce what Google did. What's more useful?

iowahansen · on Nov 22, 2008

4000 computers over 6 hours... Makes you wonder about electricity costs of this experiment.

wmf · on Nov 22, 2008

Speaking of electricity, at one point Google claimed that during a Web search your PC uses more energy sending the request, waiting, and displaying the results than Google uses to perform the search. So the next time you need to sort 1PB of data it may be cheaper to send it to Google. :-)

gsmaverick · on Nov 23, 2008

Lol, good idea. Although I could see M$ or Yahoo! looking to make some sort of a deal like that.

est · on Nov 22, 2008

OK. Zhuangbility.