For something like link coloring over a long history, a Bloom filter would seem to be ideal for reducing the number of true hash table lookups you'd need per page.
Large bloom filters have exactly the same problem. And since they're fixed size, you'd need a potentially huge bloom filter to avoid huge numbers of false positives; more likely you'd need to periodically regenerate it based on the original data.
This is a really tricky optimization because on a positive hit you've introduced more random I/O! After all, you've got the bloom filter and then the hash table lookup. False positives are also bad - so you only save something on true negatives. Is it worth it? Only if you get the tuning just right.