This is said every single time when "JIT" is mentioned but I've never heard an a...

EdiX · on Oct 24, 2018

>This is said every single time when "JIT" is mentioned but I've never heard an argument how that might be possible

I'll explain it but you have to pretend that we are in the 90s, so before you continue reading click on this link:

https://www.youtube.com/watch?v=_JphDdGV2TU

"Modern" CPUs achieve some of their speed by executing multiple instructions in parallel, using a pipeline: the first stage fetches an instruction from memory, hands it to the second stage that decodes it but while the decoding is taking place the first stage will have already started fetching the next instruction from memory. Conditional jump instructions of course ruin everything, because you need to know the result of their evaluation, before you can decide which instruction is the "next" one.

"Modern" CPUs work around this by always assuming that the jump is never taken and then, if it turns out that the jump does get taken, rolling back the partial work that they did.

As it turns out the vast majority of conditional jumps in a program always go the same way, i.e. any given conditional jump is either always taken or never taken. If the compiler knew which way the condition went it would be possible to lay out the program in a way that jumps are almost never taken, for maximum performance.

A static compiler can't do this but with a JIT you can run the program in bytecode a bunch of times and then use the information you gathered to lay it out in the best way possible.

All that I've said is 100% true and empirically verifiable. The reason this didn't work out is that in the early-2000s all x86 CPU manufacturer started adding this specific optimization directly inside the CPU. They started keeping branch counters and using them to guess which way conditional jumps were more likely to go.

There's other optimizations that a JIT compiler can do and a static compiler can't, but the story is similar. Big x86 manufacturers can do almost anything a JIT compiler can do and that's why the technology is essentially obsolete.

There's still some value in distributing a single binary that executes at near-native speed everywhere, but that's basically it.

chrisseaton · on Oct 24, 2018

A JIT can do all the optimisations that a static compiler can, and then of top of that it can do additional optimisations that a static compiler can't.

marcosdumay · on Oct 24, 2018

> A JIT can do all the optimisations that a static compiler can

As long as it has <100ms of compilation time.

chrisseaton · on Oct 24, 2018

Why do all JITs need to run in <100ms?

If you run your trading app once a day, all day, and warm it up for an hour first, and want max speed, why do you care if it takes a few seconds to compile?

bquinlan · on Oct 24, 2018

If that is the scenario that you care about, then you can use profile guided optimization in statically compiled languages.

Clang has instructions on how to do so here: https://clang.llvm.org/docs/UsersManual.html#profile-guided-...

chrisseaton · on Oct 24, 2018

But that only gives you info from previous runs. A JIT profiles the current run. That's strictly more information than an AOT compiler has.

Retra · on Oct 24, 2018

If you're running code all day long and profiling it dynamically the whole time, then you're going to pay for it. There is overhead to JIT, and there is overhead to profiling. There is no overhead for a statically compiled program, no matter how you profile it.

gnulinux · on Oct 24, 2018

I still need evidence that my real time trading app running an entire gcc in it makes it faster. I have no problem with JIT in theory, everything you say makes sense. I've never seen this trend in practice though. Either the optimizations you're talking about can be utilized too rarely, or they are too complex that most JITs don't implement, or they're too expensive that it doesn't give any net gain.

zzzcpan · on Oct 24, 2018

Well, it doesn't actually make sense in theory. While JIT theoretically can profile a current run it also adds a JIT related runtime overhead. It needs to waste a lot of resources on both profiling and recompilation to make use of any new information. That's always going to be strictly slower than PGO techniques, with basically hand picked optimizations for each application and zero overhead.

marcosdumay · on Oct 24, 2018

Yes, that's strictly more information. You still better collect it quickly, as you are compiling your functions each time they run.

Besides, people rally don't like that warming-up period. But it's not the current bottleneck.

Anyway, JIT has a great potential. Just not for desktop software, or network services, or your trading app. It can be great for scientific computing, for example. But that's potential; currently it's not any good on practice.

pjmlp · on Oct 24, 2018

Well, Android team decided that AOT introduced in Android 5 was a mistake and they went back to a mix of first level interpreter handwritten in Assembly => JIT + PGO => AOT + PGO(taken from JIT) when idle.

Also the large majority of modern Windows software is actually written in .NET, with small C++ pieces.

cygx · on Oct 24, 2018

It's true that AOT compilers can get most of the benefits of JIT compilers by leveraging profile guided and link-time optimization. In principle, we could get rid of shared libraries, ship LLVM bitcode instead and statically compile everything, enabling cross-library optimizations - but so far, we tend to not do that...