Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it has significant impact because first case is "take 100 rows from index" and the second is "take 100 rows from index _and_ for each row go to the row in the table - do the random IO and with 1M rows it is probably 1 IO/row - and check for the value of 'b'"

Such 100 random IOs will cost 0.5 sec on 1 iron platter HDD for example. So the query performance will degrade significantly until either the table is already preloaded into memory or you use SSD drives.



> Such 100 random IOs will cost 0.5 sec on 1 iron platter HDD for example.

That's an incredibly, incredibly, iffy and mostly wrong statement which depends on arguably a corner case which doesn't often reflect reality (factors include which DBMS, row ordering, table size, cache size, block size, page size, RAM size, Hard Disk seek time, HDD throughput.

The only case where that's likely is performing a very cold query on a very large randomly distributed table once (and probably only once).

Even a table of 1 million rows with ~30B per row could easily be read into memory in about 300ms (100MB read time + ~5ms seek time, or ~= 5+(1e6*rowsize/ (100e3)) )

Query Optimizers do exactly this.


>> Such 100 random IOs will cost 0.5 sec on 1 iron platter HDD for example.

>That's an incredibly, incredibly, iffy and mostly wrong statement which depends on arguably a corner case which doesn't often reflect reality (factors include which DBMS, row ordering, table size, cache size, block size, page size, RAM size, Hard Disk seek time, HDD throughput.

you're welcome to specify iron platter HDDs which would do 100 random IOs in significant different time than 0.5 sec.

>Even a table of 1 million rows with ~30B per row could easily be read into memory in about 300ms (100MB read time + ~5ms seek time, or ~= 5+(1e6*rowsize/ (100e3)) )

>Query Optimizers do exactly this.

Not always. If it has enough info, it may take a full table scan path in such a case. Still it will be about the same time - 300ms vs. 0.5 sec. i mentioned.

Of course it is for cold queries. You forgot to mention, i guess, that full table scan you propose may by-pass DB cache (default depends on DB, modern tendency is by-pass by default), and thus second query will take the same time, while table blocks brought in by random IO would frequently be stored in DB cache thus making second query run somewhat faster - depends on how much data was brought in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: