Different compiler versions, target architectures, or optimization levels can ge...

Different compiler versions, target architectures, or optimization levels can generate substantially different assembly from the same high-level program. Determinism is thus very scoped, not absolute.

Also almost every software has know unknowns in terms of dependencies that gets permanently updated. No one can read all of its code. Hence, in real life if you compile on different systems (works on my machine) or again but after some time has passed (updates to compiler, os libs, packages) you will get a different checksum for your build with unchanged high level code that you have written. So in theory given perfect conditions you are right, but in practice it is not the case.

There are established benchmarks for code generation (such as HumanEval, MBPP, and CodeXGLUE). On these, LLMs demonstrate that given the same prompt, the vast majority of completions are consistent and pass unit tests. For many tasks, the same prompt will produce a passing solution over 99% of the time.

I would say yes there is a gap in determinism, but it's not as huge as one might think and it's getting closer as time progresses.