> it'd lead to things like the cost of cache misses being assigned to instructio...

jsnell · on July 28, 2017

Sure, but that's still an in-flight instruction that is stalled waiting on a memory access (just indirectly). You don't see the miss being attributed to the instructions that preceded the load.

brucedawson · on July 28, 2017

Take a look at the "cmp ebx,eax" on line 24 of the spreadsheet linked from the article. ebx is only ever modified by pure register manipulations. eax was mostly recently modified by having a register added to it.

And yet the cmp instruction is quite expensive - much more so than those that proceed it and which are executed equally frequently.

So, somehow the cost of the r8d four instructions early must end up being propagated through the add in the previous instruction and then gets charged to the "innocent bystander" of the compare. Crazy. I would have expected the "add eax,r8d" on line 23 to pay the price.

qb45 · on July 28, 2017

This looks funny, it seems that RDI is never modified by the loop (?) so lines 18-24 always load the same data, presumably from L1. It's mysterious why of all those instructions, 20 and 24 get most interrupt hits.

brucedawson · on July 28, 2017

Good observation about rdi and its implications for the load of rcx (line 18) and therefore the load through rcx (line 20).

I added a note to that effect and also enabled adding comments to the spreadsheet.