If anyone needs that to go fast, for fixed-length (up to ~4k bits) try out my code, chemfp at http://chemfp.com/ . It's designed for cheminformatics, but can work with any data set which can be described by a fixed-length byte string and an identifier.