Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Incremental Compilation (rust-lang.org)
221 points by dwaxe on Sept 8, 2016 | hide | past | favorite | 72 comments


Off topic — I am surprised to find this statement [1] in the blog post considering how negative were my teachers in university with the "trial and error" approach of solving a problem. They always said that you should design a solution before the implementation, and I agree that it makes sense but there are cases where you need to put a print and exit to debug something; and if you think about it, that is what a debugger does.

Nowadays — and if you are doing interviews will understand — people expect you to write flawless code from scratch using a basic code editor (looking at you HackerRank) with no time for tests. Sixty minutes to write a solution for four challenges, without autocomplete, without tests. Even the genius co-worker that comes with an optimized solution for every problem would need more than fifteen minutes to solve those crazy/random challenges that at the end will measure your ability to resolve those specific questions but not problem solving skills in general.

So is trial-and-error bad or not?

[1] Much of a programmer’s time is spent in an edit-compile-debug workflow.


Designing a solution is still far from having code that works. You have designed a solution, and now you need to test it. In significantly complicated codebases, there are often all kinds of obscure issues that can crop up and make your (manual or automated) testing fail. These are not things you can think of beforehand in your design because it is not possible for you to have known about all of these issues.

In fact, later in your comment you allude to this yourself -- "with no time for tests". You're worried about interviews not having time for tests. But tests are a edit-compile-debug feedback loop. If a test fails, you need to debug it, fix the issue, and recompile. Often you have to do this many times. This is precisely what the blog post is talking about.

You're generalizing the term "trial and error" to mean something more than what your teachers probably meant.

(I do agree that those interview practices are silly. But the blog post is not talking about stuff like that.)


I think "design a solution" can be as simple as having a rough idea of what the implementation looks like. This is solid advice.

But the journey is part of the magic. Anyone expecting candidates to produce perfect code is doing it wrong. HackerRank is garbage.

I want to understand how you think. I want you to communicate and ask questions. Yes, it can challenging to ve without your tools, with only a whiteboard, but that's okay.

If I'm interviewing you and you make a typo, great! I don't care. Maybe I'll point it out to you, but I probably won't, and I certainly won't dock you for it. And yeah, you can debug. I often encourage candidates to walk through their code with sample input, which I usually want them to come up with too.

This demonstrates an understanding of how your software works. I don't actually care if it compiles. I don't care if you're a wizard in emacs or can type a million words a minute.

I also don't care if your solution is optimal. In the real world they rarely are. But, I want to talk to you about ways we could optimize it, and maybe I'll ask you to show me what you mean.

Ultimately I want you to explain stuff to me, talk technical, and display your train of thought.

Trial-and-error is fine. It's a tool, and there are no silver bullets.


> So is trial-and-error bad or not?

It depends.

Efficient trial-and-error should help you acquire a good understanding of the solution. When your program finally works, are you really sure why? If not, you might be doing it wrong.

You might want to google "test driven development". Basically, it's a programming discipline which heavily relies on shortening your feedback loops (i.e reduce the time between the moment you make an error and the moment you detect it). While it certainly is controversial, it can be seen as a serious formalization of "trial and error".


Nice to see it. Incremental compilation is one of those things that production compilers have but research languages usually don't have. You only need it when you have a lot of code, but then you really need it.


Production compilers often don't have it either.

(Having to split up your code manually into separate .cpp [or what have you] files doesn't qualify as incremental compilation—that's just separate compilation. Incremental compilation is when the compiler automatically figures out what needs to be recompiled, even when all you hand it is a big blob of code that hasn't been manually split up by a human in any way.)


The Delphi compiler has incremental compilation, but by your definition it wouldn't.

Delphi uses the source to figure out dependencies. That is, the compiler has make logic, and traces through the 'uses' declarations to recursively discover all the source and object files. It compares timestamps to discover which object files are out of date and need to be recompiled. Thus, if you touch a single file and do a compile, only a couple of files will be compiled - the main program source and the modified file. This dependency logic is in the compiler, not a separate make program.

If this isn't the compiler doing incremental compilation, I think we have a terminology problem. Because this is what is meant by incremental compilation as the term is applied to almost all production compilers, and if you persist in making a distinction, you may be confusing more than clarifying. ISTM that you're memoizing more parts of the compilation process than has historically been done in incremental compilers. It may be clearer to use a more qualified phrase to express this, rather than shift the generally accepted meaning of a term.


I don't see where in pcwalton's comment he alludes that the Delphi compiler's definition of incremental compilation doesn't match his own.

He explicitly says "Incremental compilation is when the compiler automatically figures out what needs to be recompiled".

(He does say that production compilers don't have incremental compilation, but that's not a matter of definition, and I read that to mean "usually don't")


>He explicitly says "Incremental compilation is when the compiler automatically figures out what needs to be recompiled".

He also says that merely using the separate files to deduce this (which the parent says is what Delphi does) is not incremental compilation.


It doesn't "merely use the separate files", it traverses the use statements.

There's no manual splitting of .cpp and .h files. No manual management of dependencies.


Rust is going down the path where recompilation is done on a finer granularity than files. I was objecting to setting that as the bar to meet for incremental compilation. It's technically impressive, but it will only be a win if the average compilation unit takes a noticeable amount of time to compile (this was rarely the case with Delphi). It may well be the case for Rust+LLVM.


Fair. I think Delphi counts, and I don't think that was the bar being set (but it's easy to see how it could be read that way). The C++ header file model is quite different in the amount of burden placed on the programmer over the Delphi use-statement model.


Lucid C++ was already doing incremental compilation at function level for 1993 thanks their pivoting from Lisp machines.

If I recall correctly they demo it on this video

https://www.youtube.com/watch?v=pQQTScuApWk

Also IBM had a C++ compiler version for OS/2 based on their Visual Age for... stack that used an image store for code, offering a Smalltalk like editing experience for C++, but it was quite resource heavy for early 90's machines.

It had something to do with their CSet++ offering.


If you have a single source file that's big enough for the compiler to bog down on it, then either that compiler is beyond ridiculously slow, or you need to refactor.


The tricky part is the dependencies between files. This is what C++ uses header files for, but Rust doesn't have those.


Languages like Go and Java can do separate compilation by reading header information out of the object files of imported packages/classes.


Rust has been doing that since day one.

What makes this incremental compilation different from that of, say, Go, is that it works inside a package. Before incremental compilation, Rust and Go had the exact same model: package-at-a-time compilation. With incremental compilation, Rust is more fine-grained than that: Rust can compile specific parts of a package while leaving the other parts untouched.


Ironically C++ often goes in the other direction: splitting up files causes all of them to rebuilt as soon as you touch a header, but combining them into a huge file gives you a huge speedup in compilation.


Does anyone know if there has been any work in formalizing incremental compilation? It would be great to have some sort of theoretical framework to use as a guide when implementing various incremental compilation strategies.

There is some (ongoing?) work on low-cost incremental computation by differentiating lambda calculi [1] [2] [3], and it seems like an incremental compiler could maybe be a good use case for it.

[1] http://www.informatik.uni-marburg.de/~pgiarrusso/ILC/

[2] http://bentnib.org/posts/2015-04-23-incremental-lambda-calcu...

[3] http://ps.informatik.uni-tuebingen.de/research/ilc/


Great blog post ! One thing I wonder as a compiler head though is, what is the granularity of this stuff ? Is it per-file or per-function, or even per block ? If it's finer than per file, how do you do the tree diff ? Do you parse whole files and then do a node-by-node diff ? Do you have an incremental parser ?

If I was to implement incremental compilation, I'd start with per-module I guess, because it's the atomic unit for a compiler, so it would be a lot simpler. That's why I'm curious.


I can actually help you out on this one. I spent the summer hacking on the Rust compiler for my internship. If I'm understanding the code correctly, this is the enum of elements represented in the dependency graph.

https://manishearth.github.io/rust-internals-docs/rustc/dep_...

Note that as per:

https://manishearth.github.io/rust-internals-docs/rustc/dep_...

The DepGraph that we actually care about is specialized to talk about DefIds. In rustc, DefIds are attached to the following things:

https://manishearth.github.io/rust-internals-docs/rustc/hir/...

This means dependencies are analyzed down to local variables and uses. But even beyond that it tracks things such as borrow checks, linting and more!


From an uninformed distance this looks a bit underwhelming: initial builds are slower, incremental builds are not _that_ much faster, and build outputs are no longer deterministic functions of just the build inputs (which is useful for example for distributed build systems).


> incremental builds are not _that_ much faster

Right now, they aren't. This will change as things improve. Remember, this is just the groundwork to allow incremental compilation to work at all. There are lots of places where the compiler now needs to be updated to use that framework.

> build outputs are no longer deterministic functions of just the build inputs (which is useful for example for distributed build systems).

You don't have to use incremental compilation if you care about this problem. But you will want to, because it isn't a problem in practice :)

There is no reason why distributed build systems couldn't simply be made aware of incremental compilation.


The way Rust works today is that you compile a whole crate together (all the files). The side effect of that is that the compiler can optimize the whole create vs what's in one file. You get an effect that analogous to LTO (like in C/C++). Conversely, if you now go to incremental builds the opportunity for LTO style optimizations. Thus you end up with a worse runtime performance in incremental compilation.

Caveats and fine details missing all over the place. More focused on the general point.


> Thus you end up with a worse runtime performance in incremental compilation.

Are you sure about this (in the current implementation)?


The article it self says "[...] This is because incremental compilation splits the code into smaller optimization units than a regular compilation session, resulting in less time optimizing, but also in less efficient runtime code."


Thanks, you're right. I missed that.


Why does it become non-deterministic?


I think the point is that the work done is not deterministic based on just the file inputs. Specifically, it has to pull in the external state of what the previous compile did.

I would hope that the actual output is deterministic based on the inputs. That is, changing function C in a file that has A, B, and C should be the same as a fresh compile of the same file. Even if the actual work that is done is different.


It is.


Since rust compiles to native machine code (bytes), how does it calculate the starting address of the code? for example, If there is a JMP instruction, JMP takes an address as an operand most likely -- Doesn't the kernel determine the starting address, or is the starting address the address returned by mmap?


Modern amd64 code is position-independent, so the JMP address is relative.

On other architectures, it's the linker's job to convert symbolic addresses and it can choose all the addresses in one go as it's the final stage producting the executable.


That solves internal offsets. But code can also be loaded at different addresses on many platforms/architecture combos. That's solved by relocation tables: For every address, the linker will assume a starting address of 0, and then add the location of the address to a table. The runtime linker will then go through the relocation tables and fix up any entries with the real address.


Relative to what, the start address? Does rust determine the start address from the return value of `mmap`?


Relative to the current instruction pointer aka program counter. You say things like "JMP -100 bytes".


How does this differ from what make files have been doing for decades?


You don't have to split up your code manually. You just write one Rust crate, splitting your code into files on whatever granularity you like (with mutual recursion fully supported), and the Rust compiler automatically does the rest. You don't have to write tedious header files; the compiler figures out all the dependencies automatically.


I don't split up my code manually for the compiler, I do it for the humans, being compiler friendly is just a pleasant side effect. You can build java files the same way without header files, worst case scenario is you have some fat compilation units.


> I don't split up my code manually for the compiler, I do it for the humans, being compiler friendly is just a pleasant side effect.

There is no human benefit to having to copy and paste function signatures—and worse, the bodies of functions you want to be inlined, including all templates—into header files. There's also no benefit to having the compiler parse hundreds of KB of header files over and over and over again.

> You can build java files the same way without header files, worst case scenario is you have some fat compilation units.

Java (production implementations used in practice) performs whole program optimization via its JIT. It's nice for Java, but that isn't how Rust works.


I don't disagree on the header files, no sane language designed today would include that.

> Java (production implementations used in practice) performs whole program optimization via its JIT. It's nice for Java, but that isn't how Rust works.

Whole program optimization isn't generally something you want on dev builds, which is where incremental compilation is useful. For a production build why wouldn't you do a full rebuild anyway?


> Whole program optimization isn't generally something you want on dev builds, which is where incremental compilation is useful.

By "whole program optimization" I also include things like generic/template instantiation. Whenever you use, say, a HashMap, you have to recompile the implementation of the HashMap specialized to the size, alignment, destructor, etc. of types you're using with it.

> For a production build why wouldn't you do a full rebuild anyway?

When I profile apps written in Rust, it's important for me to be able to get good turnaround time on optimized builds.


Do Java generics work like that? I thought the JVM did not know about the generics as the types are erased at compile time.


JVM doesn't know about generic types at the runtime. There's no specialization of HashMaps, it's just arrays of points to objects all around.

With project Valhalla: https://en.wikipedia.org/wiki/Project_Valhalla_(Java_languag... there's been talk about specialization for value types.


It is already available for those that want to play with it.

https://adoptopenjdk.gitbooks.io/adoptopenjdk-getting-starte...


I too thought the same. That's why pcwalton's comment confused me.


Rust generics do. Java's don't.


Splitting up code into headers and sources also helps keep interfaces separate from implementations. It allows one to not only read just the header for the full interface, but it also allows one to distribute the implementation in binary form and the interface in source form, which is important for companies.


Except for inlineable functions and templates in C++—which become a greater and greater fraction of code the more "modern" your C++ gets. Not to mention that the size of your instance variables leaks into your public interface unless you manually heap allocate and use the pimpl idiom.

In any case, Rust stores the interfaces to libraries in serialized binary form (including documentation, following a standardized format understood by rustdoc), and they can be readily extracted from crates using tools that ship with the language. So what you describe isn't a benefit of headers anyhow.


"C++ did it right" seems a separate position from that put forward by the parent comment.

I definitely think there's benefit to being able to view interface as separate from implementation, although that might well be better supported by tooling (editor folding, documentation generation) than manual maintenance - which you seem to be doing well.

I don't know whether there is benefit to being able to edit interface separately from implementation. Probably not much of one, but I could certainly be convinced otherwise.


Usually if you edit the interface, you also have to edit the implementation. The only time I've ever wanted to edit an interface alone is when there wasn't any implementation written yet, which makes it a moot point anyway.


If you make substantive changes, for sure.

It's possible that you may want to make cosmetic changes (or maybe change how you're specializing something where the implementation is more polymorphic?).

Still, I certainly don't see much of an argument that being able to edit the two separately has much value. I'm just not entirely convinced of non-existence.


One usually does want the reverse though, editing the implementation without touching the interface.


Writing applications which have an API other people write plugins against is one: commercial apps ship the .h files, against which plugin-writers only have those and build plugins for the host apps.

Yes you still obviously have to keep them in sync, but the point is you (as a commercial software vendor) can ship just the headers and keep the internal implementation details secret and change how things work completely. Which is useful for commercial vendors of software.


This is a common myth about .h files versus modules.

Most languages with native support for modules do have the necessary information stored in the binary file to enable this scenario.

The only problem is exposing those plugins across languages, but it isn't a big problem if the OS has a rich ABI instead of a C one. For example COM on Windows, or WinRT the improved version of it.


This isn't really an argument for .h files though.

.h files are an extra amount of work you have to put in. In the case of plugin APIs, this is work you'd have to put in anyway (for a fraction of the .h files -- the ones which are part of the API). But that work doesn't have to be put in elsewhere.

This is an argument for interface description files for plugins, but that could always have been done by a language with idl files or something. Indeed, apps in most other languages define interfaces via .h files because that's the de-facto plugin API description format, but they don't use .h files everywhere in the source. They only use them where necessary for the API.


You can do this in Rust just fine. The compiler can determine all the types and signatures from the tables stored in binary code. No separate .h file needed.


Two seconds of google found a .h file generator: http://www.radwin.org/michael/2011/05/10/stubgen/ . I'm sure there are others.


> Except for inlineable functions and templates in C++—which become a greater and greater fraction of code the more "modern" your C++ gets.

The C++ ABI is currently not portable anyway. So the concern about templates (or any C++ features) forcing you to put the implementation into the header file would not apply in the scenario dllthomas was referring to:

If you want to distribute your library as a binary object and a separate source-form interface/header today, you'll have to use a (wrapper) C API. Regardless of how "modern" the C++ code is.

Of course, it would be nice to have portable C++ objects some time in the future which would change things... :)


> If you want to distribute your library as a binary object and a separate source-form interface/header today, you'll have to use a (wrapper) C API.

This is only true if you want to target different compilers. If you ship your binaries targeting a specific compiler (which is what just about every company does that I've ever worked with), you don't have any abi issues.


Actually, you can even have incompatibilities with the same compiler and different compile flags. So to reliably build a "semi-portable" c++ object one would have to ensure that all objects are compiled with the exact same (i.e. same version of the compiler/same compiler source code) and with the exact same flags. I'm sure there are some compiler vendors that offer ABI backwards-compatibility but it's not part of the C++ language per-se.

See this article by Herb Sutter on the topic: https://isocpp.org/files/papers/n4028.pdf


Oh I'm well aware; as my comment indicated, I've been doing this a long time. I didn't want to get in to the full details in a HN comment, but yes, there's definitely a few things you have to coordinate on between vendors if you want to ship C++ libraries that work together.


> If you want to distribute your library as a binary object and a separate source-form interface/header today, you'll have to use a (wrapper) C API. Regardless of how "modern" the C++ code is.

Not really.

Those of us on Windows make use of COM, or since Windows 8, UWP components (formally known as WinRT).


Yes, but this is only a solution if your library is only targeting windows.

If you want users of any standards-compliant C++ implementation to be able to use your library today, you'll still have to go with c-abi symbols or ship the sources. All other worakrounds are vendor-specific and not part of the standard.

[Of course even objects containing only C symbols are not portable across platforms either, but at least the C ABI/calling convention is more or less strictly defined for any given target platform. Assuming no other platform-specific stuff like glibc is used]


Makefiles work at the file level.

This works at the abstract syntax tree level.

If you change a file that many other files depend on Makefiles will recompile all those files. This approach will only recompile the parts affected by the change in AST which will likely be significantly less.


Seems to be finer grained. Make knows I only changed foo.c and only rebuilds what depends on that. This sees that foo.rs changed and reparsed it. Sees the AST didn't change, maybe we added comments or reformatted it, and stops.


Does it track changes or changes to the surface area? If I changed the implementation of a function but not the signature, would this cause a rebuild of the dependents?


No, changing the implementation of a function does not cause a rebuild. If support is added for optimizing builds via LLVM's ThinLTO, however, then it might if LLVM decided to inline the function.


Now I see the value, thanks.


Up to 15% faster or more?! =)


There's still a bunch of compiler phases they're not caching yet, so it's a very very early result.


If you click on the asterisk next to that heading, you'll see that the 15% figure is a joke reference to an XKCD. :P In practice, it wouldn't make sense for us to say "incremental compilation makes us X% faster" because the benefit will vary wildly by workload.


So, "any value except exactly 15%"...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: