Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

tl;dr Doing a bunch of synchronous writes and full table scans with extremely slow, high latency storage (for a dataset that easily fits in RAM) is slow.


My tl;dr is: Don't do huge updates using WHERE on fields _not_ in index, otherwise it will take days to finish and you get meaningless numbers in benchmarks...


I learned that in a hard way. It was midnight, I had to make some urgent changes to the user table in the production database. We had about 4 million users and needed to make changes to 70k users (due a change in 3rd party).

I figured out after 40 minutes that we are not making any progress. For each user we are probably searching 4 million users. After I added the index to a specific field, it was done quickly.


Oddly enough, for a big enough update, postgres sometimes will do the math and decide that scanning the whole table is faster than scanning the index and then scanning a certain fraction of the table (considering also that it has to rewrite the entire page if any row in the page is updated)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: