> especially around resharding. Foursquare had a 17 hour downtime in 2010 due to...

raverbashing · on April 17, 2018

> Users, written to the database, did not distribute evenly over the two shards. One shard grew to 67GB, larger than RAM, and the other to 50GB

Main pro-tip: think very hard on how you're sharding your data, based on it and how you're sharding (hash based? range based?) and how do you intend to rebalance it or be more granular should the need arise.

dajonker · on April 17, 2018

Nice example. I have personally seen Hadoop jobs which would only scale by using larger VM instance types.

maxnevermind · on April 17, 2018

What are Hadoop jobs? MapReduce? Someone still uses them, why?

federicoponzi · on April 17, 2018

Why not? Use comodity server to compute long running tasks dosen't seems a trend fading away anytime soon.

There are more solutions other than M/R now though (spark, presto etc) but they require high performance servers (presto suggested RAM amount is 128GB).

maxnevermind · on April 17, 2018

Can you name a single type of job that MapReduce can do better(faster/using less resources) then Spark? In my experience even for most simplest tasks like when you need to just read the data, change it slightly without shuffling and write it back Spark is faster then MapReduce with the same limitations on resources, and it's much more efficient in case of heavy jobs with joins etc. Well of course, there are other things like API which in MapReduce is just a nightmare to deal with in comparison to Spark.