Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Except in Google's paper the hashing does not directly reduce memory usage in any way. It's a lossless operation on the original vectors unlike VW's lossy operation. Google's representation allows for memory reduction down the line but those mechanisms have nothing to do with hashing.


Let's think through this clearly.

Locality sensitive hashing is a way to put similar vectors into the same buckets - by omission etc. It does this by hashing, but the intent is to approximate nearest neigbours.

skipgram/ngrams turn features into other features by omission etc, and so makes similar things the same. The hashing trick then reduces memory usage.

So yes, you're right the hashing in locality sensitive hashing is different in intent, but my point is, that both these approaches are designed to be more memory and compute efficient.

And vowpal's feature interactions give you transformer layers.

Add up all these together, and they have about the same net effect.


You keep insisting that they're the same when they're not, and then you try to subtly expand your original claim of "using less memory by hashing" to "to be more memory and compute efficient" (emphasis mine), just to force them into the same bucket.

Yes, obviously locality sensitive hashing is a form of hashing. The fact that it's locality sensitive is important for this application, but you'd rather ignore that and insist on labeling them as the same thing just because they're both hashing.


http://matpalm.com/resemblance/simhash/

Simhash algorithm, the LSH i knew about (which i mistakenly thought is LSH) works exactly like VW. It is ngrams + hash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: