You need this complexity anyway - if you had surrogate keys, the primary key wou...

shkkmo · on June 5, 2024

> but with the added benefit that for the common case, you don't need any additional lookups. You only need to lookup CPR changes if you don't find the CPR that the user gave you

Doing the CPR lookup ONLY when you don't find a record is really, really stupid. All you need is one bad cache, one database update that go rolled back and suddenly you've lost all recent updates the the user and will only see the bad record with no indication you've failed.

Trying to use force changeable data into being a 'natural key' means that the number of edge cases you have to predict, write around and test for is going to rise significantly for no real benefit.

lolinder · on June 5, 2024

> The first CPR in this case becomes identical to your surrogate key - it's an opaque ID that you use to reference other tables in the DB - but with the added benefit that for the common case, you don't need any additional lookups. You only need to lookup CPR changes if you don't find the CPR that the user gave you.

I'm not buying that this is a meaningful gain in either performance or code complexity.

In the world of synthetic keys I look up the CPR in the CPR table and join it to the user table using the synthetic ID. If I find a record for the CPR+user join then I'm set, if I don't then the customer doesn't exist.

In the world of natural keys, you're advocating that I first query the user table directly by CPR. Then if I turn nothing up I run a separate query with a join on the original CPR. Then if I still don't turn something up the customer doesn't exist.

The code in the second instance is obviously more complicated than the code in the first instance. It has increased risk of someone writing a bug because now there's a very tempting CPR field that will be right most of the time but wrong in some edge cases. Depending on the database and usage patterns, indexing on the CPR may be much less efficient than indexing on an autoincrementing integer.

The only thing it has going for it is that I might be able to avoid a single join on a very specific path where a user is looking up a customer by typing in the CPR. That seems like the wrong thing to optimize for in the face of all the downsides.