Mike's primary complaint is bad register allocation. It is very important to keep the most important state consistently in registers. In my experience, compilers still struggle to do good register allocation in big and branchy functions.
Even perfect branch prediction cannot solve the problem of unnecessary spills.
Very true. I imagine that grouping instructions that use the same registers into their own functions would help with that (arithmetic expressions tend to generate sequences like this). Then you loop within this function while the next instruction is in the same group, and only return to the outer global instruction loop otherwise. If you design the bytecode carefully, you can probably do group checks with a simple bitmask.
Nearly. You need register and to also pass them into (potentially no-op) inline asm. `register int v("eax")` iirc, but it's been years since I did this.
The 'register' is indeed largely ignored, but it has the additional somewhat documented meaning of 'when this variable goes into inline asm, it needs to be in that register'. In between asm blocks it can be elsewhere - stack or whatever - but it still gives the regalloc a really clear guide to work from.
It's `register int v asm("eax")`. However they are very easily elided, especially after higher optimization levels; compilers are very open about this [1].
Even perfect branch prediction cannot solve the problem of unnecessary spills.