Compiler Performance and LLVM

pcwalton · on April 4, 2019

Look at Cranelift [1], in particular "Cranelift compared to LLVM" [2]. Cranelift is in some ways an effort to rearchitect LLVM for faster compilation speed, written by some longtime LLVM contributors. For example, Cranelift has only one IR, which lets it avoid rebuilding trees over and over, and the IR is stored tightly packed instead of scattered throughout memory.

[1]: https://github.com/CraneStation/cranelift

[2]: https://cranelift.readthedocs.io/en/latest/compare-llvm.html

mshockwave · on April 4, 2019

I think the one of the key differences is that Cranelift are more low-level

htfy96 · on April 4, 2019

Sometimes I feel many comparisons against LLVM's performance is somewhat unfair: You can easily reach -O0/-O1 performance of LLVM with much simpler infrastructure and only a limited number of cheap passes, which leads to a considerable performance boost, but many competitors like Cranelift claiming they are fast in compilation will never reach -O2 performance without major infrastructure changes.

These compilers are mainly designed for extreme performance. People complaining slow compilation of LLVM must never used ICC's -fast mode before, where a helloworld can take ~30s to compile. Developers still spend thousands on it because it squashed every drop of performance.

sanxiyn · on April 4, 2019

I am not sure what is unfair about it. LLVM is inferior if you want fast build and -O1 performance. Many people are looking for fast build and -O1 performance, so it makes sense to let them know that LLVM is not what they want.

Crinus · on April 4, 2019

I'd say most people, by far, are perfectly fine with -O1 performance. -O2 (and higher) is only needed for very small and specific parts of a codebase (e.g. the inner parts of encoding and rendering).

The problem is having both at the same time (i haven't seen any build configuration try to mix optimization levels) without compromising on compiler speed for -O1, so projects that require -O2 for a 0.1% of their codebase apply it for 100% of it.

In theory depending on the language you could mix different compilers, but that is a big can of worms (and other bugs).

nmca · on April 4, 2019

LuaJIT springs to mind here... Famously performant.

tachyonbeam · on April 4, 2019

LuaJIT also does a lot less work for a given piece of code than LLVM. It generates relatively well optimized code for a dynamic language, but it doesn't do much of the low level optimizations that LLVM and GCC do.

Crinus · on April 4, 2019

Yes, that is how it usually goes, you exchange code performance for compiling performance but my point (a couple messages above) is that most of the time this performance is perfectly fine and it is only a tiny part of the codebase that may need the extra low level optimizations that LLVM and GCC can do (if it needs it at all). Of course this is a generality, the specifics depend on the project (chances are, the rendering parts of a CPU-based raytracer for CGI movies will need these optimizations much more than most projects, whereas a file manager most likely wont need them at all).

rujuladanh · on April 4, 2019

A quick remark: general purpose compilers (including all the common ones) aren't really built for "extreme" performance. They always pick big tradeoffs and don't try everything they could.

In other words, it is not "hard" to build compilers with better performance. What is hard is getting performance without build times exploding.

htfy96 · on April 4, 2019

Just curious: which compilers according to your statement are designed for extreme performance? I know at the very right of the compilation time/performance axis lies SAT-based "super optimizers", but what lies between super optimizers and general-purpose compilers like icc?

sanxiyn · on April 4, 2019

For example, Unison is an alternative code generator for LLVM, which uses constraint programming to do combined register allocation and instruction selection. It is very slow, but it does generate optimal (not just improved!) code, within its performance model.

http://unison-code.github.io/

htfy96 · on April 4, 2019

Thanks for the information. It seems that it is a super optimizer[0].

[0]: https://en.wikipedia.org/wiki/Superoptimization

sanxiyn · on April 4, 2019

Unison is not a superoptimizer. Superoptimizer is evem slower.

sanxiyn · on April 4, 2019

The single most important topic of LLVM compilation time is missing: FastISel. As shown, all time is spent in IR to machine code step. That's because LLVM's default for codegen is optimizing. For debug build, you want to set LLVM's codegen to fast mode.

BubRoss · on April 4, 2019

How do you turn that option on? (A Google search actually returns your comment as the first result and some more links to JIT PDFs)

ahaferburg · on April 5, 2019

I would be curious what the performance looks like on bigger, more realistic source files. And what happens if you disable any optimizations, will that influence the obj generation? What about link times?

The post made me look into string interning for my compiler. I wasn't convinced that it would be that useful. I thought that most unsuccessful string comparisons are fast anyways, because I store the length for each token. With a hash map, you still have to do one comparison for every lookup, and you also have to compute the hash. But it also greatly increases the odds that it's the right comparison. And once you did the interning, you don't need to look up strings anymore at all.

I (very sloppily) implemented a hash map, and integrated it into the lexer. Despite the poor implementation, and having to build the map in the lexer, it does speed up the check whether an identifier is a keyword, and reduced the parse time to about 70%. I get similar gains for code generation, because it speeds up the symbol lookup, but it's probably going to be less useful here, since I still have terrible O(n) lookup for globals. The absolute gains are still worth it, though.

So yeah. Thanks for encouraging me to look into it!

jondgoodwin · on April 6, 2019

Hey - I am thrilled to hear about your positive experience with interning strings. I never actually did a performance test, so I am delighted to hear your gains were substantial.

As for you questions about performance on bigger source files and twiddling optimization options, I too am curious about that. I will likely revisit those questions at some point in the future. It will be easier to do once I have baked these diagnostics into the compiler.

fooker · on April 4, 2019

Also see Google's Subzero and Apple's B3.