Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it'd lead to things like the cost of cache misses being assigned to instructions that have nothing to do with memory access.

But this happens all the time with perf. I often see cache misses attributed to a jump dependent on a data load.



Sure, but that's still an in-flight instruction that is stalled waiting on a memory access (just indirectly). You don't see the miss being attributed to the instructions that preceded the load.


Take a look at the "cmp ebx,eax" on line 24 of the spreadsheet linked from the article. ebx is only ever modified by pure register manipulations. eax was mostly recently modified by having a register added to it.

And yet the cmp instruction is quite expensive - much more so than those that proceed it and which are executed equally frequently.

So, somehow the cost of the r8d four instructions early must end up being propagated through the add in the previous instruction and then gets charged to the "innocent bystander" of the compare. Crazy. I would have expected the "add eax,r8d" on line 23 to pay the price.


This looks funny, it seems that RDI is never modified by the loop (?) so lines 18-24 always load the same data, presumably from L1. It's mysterious why of all those instructions, 20 and 24 get most interrupt hits.


Good observation about rdi and its implications for the load of rcx (line 18) and therefore the load through rcx (line 20).

I added a note to that effect and also enabled adding comments to the spreadsheet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: