Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that intersections using HLLs don't work in general. The error rates become gigantic for sets that have small intersections, or whose cardinality differs greatly.

A general solution to cardinality estimations for intersections is to just use HLLs for union operations, and keep a MinHash sketch for each key as well to perform intersections.

There is a very good analysis of this technique over at http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html



HN is great. I was looking for something similar to HLL to do intersections, but couldn't find it because I didn't know what it was called. MinHash, apparently!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: