It's what allows Rust to have memory safety without garbage collection, which is...

ExpiredLink · on May 6, 2014

Rust requires the programmer to do manually what the compiler could do for him - of course without garbage collection.

scott_s · on May 6, 2014

I'm not sure what you're trying to say. Perhaps I can try to sum up my view (and what I think is the view of many other people on this thread):

If you start from the premise that garbage collection is not an option in some circumstances, and you want to maintain memory safety, then I think you inevitably end up in a situation where programmers must make their memory access patterns explicit. Rust achieves this through its type system.

My guess is that you're disagreeing with the above premise.

pcwalton · on May 6, 2014

How could the compiler achieve safe memory management without any annotations, runtime overhead, or garbage collection? I would be interested in answers to this question, but they are not obvious to me after years of thinking about this problem :)

jasode · on May 6, 2014

The short replies from ExpiredLink lead others to misinterpret his response as pro-GC and therefore the downvotes but now that I've read all his other comments in this thread, I think I've pieced together what his actual thesis is: without using any runtime GC, a sophisticated enough compiler should have enough knowledge from analyzing the source code to "automatically" know the type of memory management classification for any variables. The programmer shouldn't have to manually annotate them.

I'm not a compiler theory expert but my hunch is that ExpiredLink's premise is wrong. I'm guessing that any non-trivial program requires a priori knowledge from the programmer's mind to properly inform the compiler which variables do what and when.

To use an analogy in the spirit of ExpiredLink's thinking: we don't manually annotate statements such as "int [register_cx] bank_account = balance + deposit"

We don't have to "annotate" statements with micro-details such as specific x86 registers bx,cx,dx,etc. "It would be the wrong abstraction" to borrow his wording. The compiler assigns registers on our behalf. It's been doing that for decades. I'm guessing memory safety is a much more difficult problem. If perfectly optimal register allocation is an NP-complete problem, it seems safe to assume that compiler generated safety annotations for memory is NP-complete.

Therefore, the proof for ExpiredLink would be a short example source code that demonstrates how a priori knowledge is unavoidable to achieve extra memory safety and maintain C/C++ performance. The only other option is some mythical compiler that runs for 10 hours analyzing 1,000 lines of source code trying to trace all possible memory accesses only to come up with a suboptimal solution to an NP-complete problem. And on top of that, would the compiler have to be so sophisticated as to have to create a VMware/Virtualbox virtual machine to simulate all memory accesses? I have no idea.

pcwalton · on May 6, 2014

> a sophisticated enough compiler should have enough knowledge from analyzing the source code to "automatically" know the type of memory management classification for any variables. The programmer shouldn't have to manually annotate them.

This might be possible, but there are a few big problems with it:

1. We'd be throwing separate compilation out the window. What makes separate compilation work is that the types are fully deducible for each function without knowledge of the functions that call it. In our system, this requires some amount of manual annotation.

2. Systems programmers often need to know whether allocation is happening; they don't want a compiler doing it behind their back. For example, in interrupt or async signal handling context, you can't allocate. A language that implicitly allocates is not usable for such contexts.

3. When you get an error relating to memory management—for example, a borrow check error—it's important that the programmer be able to diagnose what happened and fix it. Without explicit annotation, it's notoriously harder to report type errors—the programmer has to try to reconstruct what the compiler was trying to do.

shmerl · on May 6, 2014

I think the problem with this is that even if compiler can somehow semi-magically deduce that, it can be still be not what the author really intended. So making memory ownership semantic explicit improves clarity and avoids any kind of ambiguity and unintended logic. Also, it helps communicating the intention to others who might read or maintain the code later. Obscuring this is not helpful at all.

jasode · on May 7, 2014

>even if compiler can somehow semi-magically deduce that, it can be still be not what the author really intended.

Right, but to entertain what I think ExpiredLink is proposing... you have to reframe it as "there's no intention for ownership/lifetime for the programmer to even express" therefore, there can't be a disconnect between what the compiler did and what the programmer thinks it did.

To use the x86 register analogy again, we don't explictly notation our intentions of which registers to use and therefore we don't type a bunch of extra verbage to annotate it. The compiler figures it out and it's opaque to us. There is never a conversation about how the compiler didn't match our intentions because we're not working "at that abstraction level" at all.

But as I mentioned before, I believe that memory lifetime/ownership is a totally different class of automated analysis (on behalf of the programmer) than register allocation.

I think the only way to raise the level of discourse on why annotation is unavoidable is to show a short source code example that proves that ExpiredLink's premise is impossible for any compiler analysis to fulfill.

shmerl · on May 7, 2014

Also, Rust avoids taking control and clarity away from the programmer. Such automated memory management (which will automatically deduce ownership and lifetime for example) will make the result less predictable to the author which is a downside. So even if feasible (which I'm not sure about), it doesn't fit to be a mandatory feature.

twic · on May 7, 2014

> I'm not a compiler theory expert but my hunch is that ExpiredLink's premise is wrong. I'm guessing that any non-trivial program requires a priori knowledge from the programmer's mind to properly inform the compiler which variables do what and when.

I also think the thesis is wrong, but i'd put it differently. I suspect that in many cases, a sufficiently smart compiler could analyse a program and assign plausible classifications to every variable. However, i suspect that there will be some cases which are not - there is no set of classifications which can be assigned which will conform to the rules. Inevitably, these would be exactly the cases you actually hit when writing real programs. In other words, i think this ends up looking a bit like the halting problem.

As a thought experiment, imagine Rust, but without any of the memory management classifications on variables: no tildes, ampersands, or single quotes. You could compile and execute this with fairly conventional garbage-collected memory management semantics. But you could also try to assign memory management classifications to every variable, and the compile it as Rust. The number of possible assignments over the whole program is clearly finite - there are a finite number of variables, a finite number of pointer types, a finite number of lifetimes, and so a finite number of arrangements of all those. Large, but finite. So you could, in principle, enumerate them by brute force.

If you took an existing, correct, Rust program, erased all the classifications, and fed it into this brute force re-classifying compiler, it would eventually find a legal classification, and compile it. It might be the original classification, or it might not. This means that there isn't a requirement for "a priori knowledge from the programmer's mind". Any program which could be given a plausible classification can be given one automatically.

However, i think there is no such certainty that it would find a legal classification for a arbitrary program. Indeed, i suspect it wouldn't be too hard to find a counterexample: you just need a Rust program which doesn't compile, and can't be fixed purely by changing memory management classifications. Anyone want to give that a go?

azth · on May 6, 2014

GC is not the answer to everything.