Large bloom filters have exactly the same problem. And since they're fixed size, you'd need a potentially huge bloom filter to avoid huge numbers of false positives; more likely you'd need to periodically regenerate it based on the original data.
This is a really tricky optimization because on a positive hit you've introduced more random I/O! After all, you've got the bloom filter and then the hash table lookup. False positives are also bad - so you only save something on true negatives. Is it worth it? Only if you get the tuning just right.
This is a really tricky optimization because on a positive hit you've introduced more random I/O! After all, you've got the bloom filter and then the hash table lookup. False positives are also bad - so you only save something on true negatives. Is it worth it? Only if you get the tuning just right.