For something like link coloring over a long history, a Bloom filter would seem ...

emn13 · on March 5, 2013

Large bloom filters have exactly the same problem. And since they're fixed size, you'd need a potentially huge bloom filter to avoid huge numbers of false positives; more likely you'd need to periodically regenerate it based on the original data.

This is a really tricky optimization because on a positive hit you've introduced more random I/O! After all, you've got the bloom filter and then the hash table lookup. False positives are also bad - so you only save something on true negatives. Is it worth it? Only if you get the tuning just right.