Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

// The InverseBloomFilter may report a false negative but can never report a false positive.

This is actually not true and not possible. There is a good explanation here: http://cstheory.stackexchange.com/a/14455/43

There will be a low probability of false positives ~ 2^32*k. (Also it's not an inverse-bloom filter, it's a cache).



Well, I'm looking at the code and it looks like it doesn't report false positives. I understand that you expect it to hash the content down and store only the hash, but it doesn't do that.

EDIT: I suppose my point was just that there is a difference between "I think you chose a bad name for your class" and "This is actually not true and not possible". It seems both true and possible (and arguably inappropriately named).


Then it doesn't do it in constant space. Constant space is the property that makes it not a true inverse bloom filter, and instead is just a cache with an interesting hashed based eviction method.

That said, this is just a silly definition issue. You can provide a data structure with no false positives, and some false negatives, but you can't do it in constant space. I'm not aware of if you can absolutely bound the rate of false negatives.


I guess it's storing the pointer to the data? That sounds pretty bad.

At any-rate it's one of two things, either you store the whole item or you store a hash. If you're storing the whole item it's not a filter and you don't get constant space guarantees as you mentioned. If you store a hash - it's possible to have false positives.

You can always get a bound for false-negatives (or none at all) same as it with any hash-table. If you wanted to, you could Cuckoo hash before eviction and you should be able to get a pretty-good theoretical bound for false-negatives (http://infoweekly.blogspot.com/2010/02/cuckoo-hashing.html).


His answer hints at an interesting solution: For a small element store it. For a big element do nothing.


I have a hash thing I'm working on that spills well over the Ln caches, so hitting the alternative buckets (its a linear prober) is a non-starter. Turns out the probabilities for an item ending up in an alternative bucket is in the 0.0n% range.

3 bits end up doing what would require a multiword bloom filter with 30 k hashes.



If it's a cache, there is no chance of false positives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: