Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is what made it a funny thought experiement to me. I was doing some space related stuff a while back and when you're dealing with something as small as 100mb, and you add the constraint of up to a light-year or more of latency, which makes error correction immensely costly, having a quantum processor reciever find the collision of a unique hash could be faster than transmitting the data with errors.

There's probably a distance where the latency in light years is longer than it takes for a quantum processor with some shortcuts to exhaust the search space for the collision on the hash - and potential hashing algorithms that contain information and hints about a hyperplane that generates the original data over the field of possibilities. To a quantum computer with sufficient qbits at a long enough latency distance away, the 32-bit hash of a 2048bit RSA key may "arrive" faster than transmitting the whole key.

It feels like a fundamental misunderstanding of something in communication theory, but this was the thinking that prompted it.



Using the pigeonhole principle, consider how many 100 megabit values must, on average, hash to the same 32 bit value. The answer is a huge number (I could be wrong but it seems to me it might be 2 ^ (100,000,000 - 32)?). This is a bit of a paradox in cryptographic hashing, where finding a collision is difficult, but the number of collisions is enormous.


Yeah, you're fundamentally missing what I wrote, as well as pigeonhole principle and the basics of information theory.

Let's say you replace the quantum computer with an oracle. You have an infinitely long dictionary, and you can turn to any page and see your hash, as well as every possible input that could form that hash. There's billions and billions of options that all match your hash - how could you pick which one is definitely the "right" one?

You're no closer to knowing what the transmitted message was, because of the astronomical number of inputs that you now have to sift through.

[ Technically, you're (length of hash in bits) closer. Say you have 10 bits of hash, or 1024 values. There are (on average), 2 11-bit inputs that create that hash, 4 12-bit inputs, 8 13-bit inputs, and so on, following the powers of 2. At input of N=30 bits, you have 2^20 =~ 1 million possible inputs with that exact same hash. 1KB? 2^990, or "one followed by 990 zeroes". ]


This part about a massive number of possible collisions for a given plaintext in a cryptographic hash, but just spread over such a large field and incrememting with the input size so that it is just unlikely you will find another one is the counterintuitive part. I haven't implemented a secure hashing algorithm from scratch, so this aspect would have come out in the exercise.

Even working with security and crypto, the presumption of a cryptographic hash from a standard algorithm is that it is universally unique for the data it hashes - barring someone finding collisions, and then it gets deprecated. Most people deal with them at a higher level of abstraction.

Hash collisions have been used in a few high profile attacks in the last decade, most notably when some SSL certificates were still using MD5, and the most recent smart contract issue where the contract was only validating the last 4 bytes of a cryptographic hash and someone produced a key that exploited that with a collision. Unfortunately the people who don't make those mistakes are both very rare, and often dynamic to work with.

The idea that there are many, many potential collisions for a given input to a cryptographic hash doesn't come up much unless someone manages to find single a collision in one. I'd even posit nobody outside academic cryptography circles is including sha256 collisions in their threat scenarios right now.

However, the absurd/imaginary/hypotehtical/counterfactual I was raising is that a future quantum computer can produce all outputs of that length in a reasonable amount of time. The question was more about what was sufficient to reconstruct the data, and it was funny to know something was specifically wrong but not know why.

However, I appreciate the time taken to illustrate it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: