This is said every single time when "JIT" is mentioned but I've never heard an argument how that might be possible. For all I know you'll have two problems (1) your heuristic should be so damn fast that it compensates for the runtime analysis time, this is non-trivial for JIT but trivial for AoT since runtime analysis cost is 0 (2) you need to tell us a story about how can this heuristic be so specialized that I cannot write my program, profile it on some training data (bunch of program inputs), profile and optimize (gcc and llvm can do this trivially). I never saw any code solving (1) or (2). Do we have any evidence JIT being faster than AoT? No, I see no such evidence. Even the most cutting edge JIT compilers like LuaJIT or JVM are lightyears behind gcc, llvm, rustc etc... Do we have any evidence we can perform more optimizations at runtime compared to compile-time, no I don't see any evidence. JIT was a nice, powerful, avant-garde idea that could change everything, but I don't think it turned out to be the superstar we all thought it'd be.
"Modern" CPUs achieve some of their speed by executing multiple instructions in parallel, using a pipeline: the first stage fetches an instruction from memory, hands it to the second stage that decodes it but while the decoding is taking place the first stage will have already started fetching the next instruction from memory.
Conditional jump instructions of course ruin everything, because you need to know the result of their evaluation, before you can decide which instruction is the "next" one.
"Modern" CPUs work around this by always assuming that the jump is never taken and then, if it turns out that the jump does get taken, rolling back the partial work that they did.
As it turns out the vast majority of conditional jumps in a program always go the same way, i.e. any given conditional jump is either always taken or never taken. If the compiler knew which way the condition went it would be possible to lay out the program in a way that jumps are almost never taken, for maximum performance.
A static compiler can't do this but with a JIT you can run the program in bytecode a bunch of times and then use the information you gathered to lay it out in the best way possible.
All that I've said is 100% true and empirically verifiable. The reason this didn't work out is that in the early-2000s all x86 CPU manufacturer started adding this specific optimization directly inside the CPU. They started keeping branch counters and using them to guess which way conditional jumps were more likely to go.
There's other optimizations that a JIT compiler can do and a static compiler can't, but the story is similar. Big x86 manufacturers can do almost anything a JIT compiler can do and that's why the technology is essentially obsolete.
There's still some value in distributing a single binary that executes at near-native speed everywhere, but that's basically it.
A JIT can do all the optimisations that a static compiler can, and then of top of that it can do additional optimisations that a static compiler can't.
If you run your trading app once a day, all day, and warm it up for an hour first, and want max speed, why do you care if it takes a few seconds to compile?
If you're running code all day long and profiling it dynamically the whole time, then you're going to pay for it. There is overhead to JIT, and there is overhead to profiling. There is no overhead for a statically compiled program, no matter how you profile it.
I still need evidence that my real time trading app running an entire gcc in it makes it faster. I have no problem with JIT in theory, everything you say makes sense. I've never seen this trend in practice though. Either the optimizations you're talking about can be utilized too rarely, or they are too complex that most JITs don't implement, or they're too expensive that it doesn't give any net gain.
Well, it doesn't actually make sense in theory. While JIT theoretically can profile a current run it also adds a JIT related runtime overhead. It needs to waste a lot of resources on both profiling and recompilation to make use of any new information. That's always going to be strictly slower than PGO techniques, with basically hand picked optimizations for each application and zero overhead.
Yes, that's strictly more information. You still better collect it quickly, as you are compiling your functions each time they run.
Besides, people rally don't like that warming-up period. But it's not the current bottleneck.
Anyway, JIT has a great potential. Just not for desktop software, or network services, or your trading app. It can be great for scientific computing, for example. But that's potential; currently it's not any good on practice.
Well, Android team decided that AOT introduced in Android 5 was a mistake and they went back to a mix of first level interpreter handwritten in Assembly => JIT + PGO => AOT + PGO(taken from JIT) when idle.
Also the large majority of modern Windows software is actually written in .NET, with small C++ pieces.
It's true that AOT compilers can get most of the benefits of JIT compilers by leveraging profile guided and link-time optimization. In principle, we could get rid of shared libraries, ship LLVM bitcode instead and statically compile everything, enabling cross-library optimizations - but so far, we tend to not do that...