AES-based Synthetic IDs: deterministic AE for 64-bit integers

femto113 · on April 11, 2020

Similar ideas go back quite a ways. Here’s notes from a Perl library based the Skipjack cipher, this wound up inspiring some of my own work and I wound up porting it to node.js.

> One example where Crypt::Skip32 has been useful: You have numeric database record ids which increment sequentially. You would like to use them in URLs, but you don't want to make it obvious how many X's you have in the database by putting the ids directly in the URLs.

> You can use Crypt::Skip32 to scramble ids and put the resulting 32-bit value in URLs (perhaps as 8 hex digits or some other shorter encoding). When a user requests a URL, you can unscramble the id to retrieve the object from the database.

https://metacpan.org/pod/release/ESH/Crypt-Skip32-0.17/lib/C...

sjnu · on April 11, 2020

I’m curious what properties this has that aes(0||id) doesn’t.

sneak · on April 11, 2020

I always just have a uuid column with an index (which is populated with a random uuid) and use that externally for URLs and such. What is the benefit of this over “slightly smaller database”, and that nagging feeling of violating DRY due to each row having two unique constraints/identifiers?

UUIDs aren’t big when stored properly (ie not as strings); I just got over it. Am I missing something?

cryptbe · on April 11, 2020

Haha I came here to ask the same question.

This "SIV" mode is silly and breaks down completely when encrypting more than 2^32 IDs.

Your proposal is not only faster, but also safer. AES is a strong pseudorandom permutation, the 00000000 padding is a perfectly fine integrity check.

bascule · on April 11, 2020

Author here.

You're right (if you add a constant-time check upon decryption that the bits are zero).

I suggested as much here yesterday, and may revise the scheme to do so:

https://www.reddit.com/r/crypto/comments/fyn8cs/aesbased_syn...

rainsford · on April 11, 2020

Does the check even need to be constant time?

The usual use case for a constant time comparison is to avoid allowing an attacker to guess a value one byte at a time. For example, a 16 byte MAC value on a packet could be brute forced byte-by-byte if the system reveals to the attacker which byte the MAC check failed on.

But in the zero padded AES case, the check is being done after an AES decryption operation, so the attacker can't attempt to generate each zero value a byte at a time. After 256 guesses, an attacker able to see timing information could figure out when their 128 bit input guess generated a decrypted value with a valid zero in the first padding byte. But that does them no good trying to generate a zero in the next padding byte because any change they make to their input guess changes all the decrypted output bytes, including that first zero. Each successive zero byte means they have to start over, so generating 8 zero bytes requires 2^64 guesses as far as I can tell.

benmmurphy · on April 11, 2020

yer. attacker can't generate longer collisions from shorter collisions so constant time comparison is not necessary. assuming AES in ECB mode and you are not doing something weird like using an IV mode where attacker can tweak the IV to generate different plaintext.

heavenlyblue · on April 11, 2020

I also don’t understand how padding oracle attack corresponds to this if all IDs are under 128 bits.

blattimwind · on April 11, 2020

While AES(0||id) is subject to a padding oracle, it's not immediately obvious why this would be a useful capability to an attacker, since you can't tweak your input based on the oracle's output (unlike e.g. AES-CBC).

tptacek · on April 11, 2020

How is AES(0||id) subject to a padding oracle? Am I misunderstanding the notation?

blattimwind · on April 11, 2020

Yeah, that's not a padding oracle, but it's similar in concept, because the prefix check after decryption will likely leak whether the app considers the ciphertext valid, ala:

    pk = decrypt(params.id)
    if pk[0:8] != EIGHT_ZEROS:
        return Http404
    id = int(pk[8:16])
    object = db.query(id)

Also stuff like this isn't really specific to using this particular construction. Even if systems are designed to return "does not exist" instead of "forbidden", it's hard to make authorization checks constant time and I've never seen code to even try that.

tptacek · on April 11, 2020

Sure, but you can't adaptively choose a new ciphertext to iterate with. Which is the core of the concept.

blattimwind · on April 11, 2020

Yeah, exactly. I don't think we disagree, I just abused the name a little bit - not a good idea in this field!

minitech · on April 11, 2020

Slowness and complexity, it looks like! Unlike a prefix of zero bytes, the MAC has to be compared in constant time; there are more moving parts; it seems like there might be a birthday problem problem where two ids with colliding IVs (50% chance of existing among 5 billion ids, noticeable without needing the key) will be XORed against the same encrypted counter?

orlp · on April 11, 2020

It's 'misuse resistant', detecting malleability. So... what properties does this have that AES(H(id, k1) || id, k2) with 64-bit id and keyed hash function H does not have.

EDIT: actually I don't believe that my above scheme is any better than your original as long as you verify the zero padding is correct.

rainsford · on April 11, 2020

SIV mode is misuse resistant with regard to repeated nonce values, but that property doesn't seem to apply to the SID use-case. And if you assume the user isn't validating the decrypted value, SID has a far more catastrophic malleability problem than the zero padded AES approach.

Because SID encrypts the 64-bit ID using counter mode, not validating the SIV value allows an attacker to make specific changes to the decrypted ID (e.g. flipping individual bits) or inserting chosen ID values if the attacker ever learns the mapping between an 128-bit encrypted ID and the decrypted 64-bit value.

Even if you don't validate the zero padding in the other approach, an attacker is only able to get the system to accept random 64-bit ID values as far as I can tell. Still not great, but less catastrophic and it's not a malleability problem as I understand the term because the attacker is unable to make specific changes in the decrypted output.

minitech · on April 11, 2020

What does “detecting malleability” mean? Misuse by forgetting to check the zero?

orlp · on April 11, 2020

Essentially, yes.

jemfinch · on April 11, 2020

It's reversible. That's the whole point. You don't need to keep a separate table or column for the hashed id.

rainsford · on April 11, 2020

Isn't AES encryption also reversible? The use-case for AES-SID appears to be mapping 64-bit ID values to 128-bit ciphertexts and reversibly producing an authenticated 64-bit ID value from the 128-bit ciphertext. Zero padded AES does both of these things as far as I can tell.

tptacek · on April 11, 2020

That by itself is an old problem; see for instance Black and Rogaway's paper on restricted domain ciphers:

https://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf

dchest · on April 11, 2020

It's also reversible:

newid = AES-EncryptBlock(0||id)

0||id = AES-DecryptBlock(newid)

Check for 8 zero bytes for authentication. Bonus: can use a constant instead of zeros for domain separation (eg different DB columns or tables).

benmmurphy · on April 11, 2020

i see tptacek is stalking this thread so i'm going to post a semi-off topic question.

with AES-GCM is it safe to generate nonces randomly? i see some recommendations that after a certain number of encryption operations you should rotate keys.

i haven't really looked at AES GCM SIV but looking at the description of the algorithm on this github page it doesn't look like it fixes 'misuse' if your misuse is generating random nonces and generating huge numbers of ciphertexts.

i see AWS has a KMS solution that encrypts stuff server side using AES-GCM that don't mention anything about IV generation and only support rotating keys once a year. is it safe to assume this scheme is safe if you are encrypting lots of ciphertexts with a single key?

api · on April 11, 2020

Search for the birthday paradox. Basically if you assume an attacker that can store every messages, as used nonces accumulate the chance of a collision increases. It does not remain constant, as naive intuition would suggest.

The maximum suggested number of uses per key can be calculated based on the desired security bound (chance of a collision with past messages) and the size of the nonce/IV.

If you want a rule of thumb, (N/2)-1 the number of bits in the nonce is not too bad. So if you have 64-bit nonces, you should send no more than 2^31 messages.

This only applies to random nonces. If you are incrementing the nonce sequentially, you get as many nonces as the range of your integer. But here be dragons: if you lose track, you will definitely repeat nonces! This is why schemes that save and restore the nonce value are dangerous. If you lose power and restore an old value, all the nonces from then until the latest real position will repeat.

Usually it's safe to use simple incrementing nonces with ephemeral keys that live only as long as a session and a specific instance of the app/OS/whatever. You can restart the nonce with a new key no problem. Just be sure your increment is protected by a mutex or something like std::atomic<> if there are multiple threads or two threads could send the same nonce.

benmmurphy · on April 11, 2020

yer. i've always used N/2 as a rule of thumb and checked if that is a big enough number. i saw this quote from NIST on crypto exchange in regards to AES-GCM. i would usually assume 96 bit -> 48 bit would be enough but they have what I think is some pessimistic advice:

https://crypto.stackexchange.com/questions/42982/safety-of-r...

>> AES-GCM takes a 96-bit nonce and NIST says that you can only encrypt 2^32 messages under a single key if using random nonces. This is because if you throw 2^32 balls at 2^96 buckets then you have roughly a 2^-33 chance of getting two in the same bucket and NIST drew the line there. That probability might seem either high or low to you. It's pretty small in absolute terms and, unlike a work factor, an attacker can't spend resources against it, but it's a very long way from the safety margins that we usually use in cryptography.

i notice libsodium uses xchacha20 which has a larger nonce which is 192 bits size. and in there docs they say:

>> Unlike plain ChaCha20, the nonce is 192 bits long, so that generating a random nonce for every message is safe.

Not sure if they are referring to both libsodium ChaCha20 implementations. One of them is obviously very unsafe with only 64 bit nonce, but one is 96 bit the same as AES-GCM.

blattimwind · on April 13, 2020

> i would usually assume 96 bit -> 48 bit would be enough but they have what I think is some pessimistic advice:

The "You have n Bits, then you need 2^n/2 tries for a birthday attack, so your real 'security level' is n/2 bits" rule captures the expected value of tries to get a collision. That's the 50 % point. You don't want to design your crypto system in a way that results in a 50 % chance of at least one nonce repetition.

bob1029 · on April 11, 2020

This is an interesting approach for primary key obfuscation.

I recently stumbled upon something that I feel might be even more powerful - Hashing complex types into 256-bit keys. E.g. If you had some type representing the composite keys of a Customer in your system (email address, phone number, etc.), you could serialize this instance, run it through SHA256, and as long as the same process is used for lookups, you can get everything back out as expected.

Essentially, you can compress your entire scope of composite key data into a single 256 bit value. Just like with GUID keys, this can be pre-computed on each client (whereas autoincrement cannot). This approach is very clever IMO because it can be used directly on top of any universal byte[]/byte[] key-value store. Your keys are all 256 bit values corresponding to the SHA256 of a serialized complex key instance. The type information can be encoded into the key itself (e.g. hash the fully-qualified type name as well).

bascule · on April 11, 2020

If you use an unkeyed hash (as opposed to a PRF) on low-entropy inputs, they can be preimaged by an attacker.

This is especially problematic in the case of PII like email address/phone number

randallsquared · on April 11, 2020

Don't use things that can change for IDs. Any individual thing that describes a user can change, so just use synthetic IDs.