Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I started my dev career, NoSQL was the rage and I remember reading about BigTable, Cassandra, Dynamo, and most importantly LSMs. They made a big deal about how the data on stored on disk was sorted. I never knew why this was a big deal but always kept it in mind, but I never bothered to understand how it was done previously.

>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.

This was very surprising to read.



I'mma pop up again with this, since it's not mentioned - there's a CLUSTER command that lets you reorder the table data to match an index. It's a one-off so you'll need to run it regularly from a crontab or something, but it's important to be aware of because postgres keeps a "correlation" statistic between the indexes and the order on disk. It affects the query planner, biasing it against random disk access and towards sequential disk access. It's one of the possible reasons postgres might not use an index that otherwise makes sense - the random disk access penalty is too high.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: