*to get pgsql to force a particular index, and forgo destroying the site for ano...

ehsanul · on Feb 4, 2011

The PostgreSQL planner has always done a flawless job for me in terms of choosing between indexes and tablescans - on reasonable queries.

While that has been my general experience as well, things can break down at the edges sometimes. I sometimes have to join 200m-row tables that don't fit into RAM (though individual partitions of the table do), against 1-10m-row tables of new data. This task is probably something more suited to Hadoop, but we're using PostgreSQL for it as that's where the data is.

With a plain vanilla join using indexed columns for the join, the smaller table is scanned while an index is used for larger table on the joined column. This is reasonable except for the fact that this results in a lot of random IO, so if you don't have an SSD or something, this will be slow. Of course, you can just increase the random_io_cost or whatever it's called, to adjust costs estimated by the query planner. I believe that results in a sorting operation for the tables before the join. Sorting can be better compared to a thrashing disk, but still slow.

The core issue is random IO here, so how about just warming up the cache for that particular partition's index/table with a big sequential read of the entire index/table, and letting it use the index-based query plan as usual? The sequential read takes a minute in the worst case for a single partition, but then when everything's cached.. BAM. The join query runs in 2-20 seconds (for a single partition, one of 26). If Postgres were psychic, it could have done this itself instead of making me force caching with an otherwise pointless extra query. But I suppose that behavior would usually not be desired, as it would destroy the existing cache. In this case, destroying the older cache was the right thing to do, but Postgres can't know that.

The whole point of this comment being: DBAs matter and you can't rely fully on the query planner to do what you may expect. Even a scrappy one like me (I'm mostly a programmer, the DBA stuff just comes with the territory for me). This is partly in response to a comment elsewhere on this page, mentioning how the query-planner is better trusted than a DBA - I wish it were always so.

rbranson · on Feb 4, 2011

That doesn't sound right. In default conditions, PostgreSQL will shy away from doing index scans unless you're operating on <0.1% of the table. The planner builds a selectivity figure for the table based on table dimensions, index statistics, the operator used (= is more selective than <), and of course it's estimation of cost for the various operations needed to run the query. The decision to perform an index or sequential scan usually has little to do with the table size ratio during a JOIN. It's likely your index statistics were poor and/or the server configuration was messed up.

ehsanul · on Feb 4, 2011

You're probably right, I'm an amateur at this. My current solution works pretty well, but I should really drill into what may be wrong here.

If you don't mind me asking, what in your estimation would be an ideal query plan for a join of the type I've described? A hash join? Also, I've left statistics to their defaults and run ANALYZE after bulk data uploads (the only time data is written), but I'll try bumping statistics collection up and running VACUUM ANALYZE again. I don't know what server configuration could be messed up to cause something like this; I have my memory settings (shared_buffers, effective_cache_size) set up fine, and cost parameters have been left alone. Other configuration settings I've changed shouldn't be affecting read queries.

mcfunley · on Feb 4, 2011

Right, what I am describing there is battles in the long war to denormalize, once a normalized site gets popular. We have since denormalized many things into dirt-stupid yet very functional mysql shards.