In languages with GCs or managed in some other way, knowing the upper bound of l...

barrkel · on Feb 21, 2016

In Java specifically, I've gotten good results by following this technique:

- start with an empty routine for the main loop to be optimized

- add the simplest loop you can that iterates over your source data, written such that has a simple obvious bounds check

- test the loop in a microbenchmark harness and look at the generated assembly if necessary to verify that bounds checks get hoisted out, the loop is being unrolled, etc.

- slowly add more code and continuously verify that assumptions are met

- if the loop body gets too complex and the JVM stops optimizing it properly, try and split the loop into multiple seperate (possibly nested) loops that are in their own functions, and repeat the process

- when finished, experiment with inlining loops again

Coming at it from the other angle - a complex loop that you poke until the generated code is decent - is much more frustrating. E.g. manually unrolling loops is fairly pointless - you really really want the JVM to unroll the loops, because it will do so with fewer bounds checks that you can't eliminate yourself. Start with the JVM producing good fast code, and keep it that way as the complexity grows.

chrisseaton · on Feb 21, 2016

Yes I understand that the java.util.concurrent code includes a lot of loops that look unusual, and in fact they're carefully constructed in such a way as to just avoid the poll (the check-if-we-need-to-stop-the-world part) for inner-loop concurrency operations.

umanwizard · on Feb 22, 2016

Just checked on my system (fairly recent macbook pro) and a loop that does nothing but increment a memory location and runs 2^32 times takes 8 seconds. Doesn't Java stop the world more often than that?

chrisseaton · on Feb 22, 2016

I think something that accesses main memory goes beyond 'simple' instructions for this definition.

This is an example of a Java compiler phase doing it, although here it looks like it's only doing it for empty loops (or loops that end up empty due to hoisting), so it's not quite what was going on about https://github.com/graalvm/graal-core/blob/0a68438187e7e3196....