Lots of GC vs Manual MM do a disservice to manual MM, by comparing GC to a strawman with lots of malloc/free.
IME, manual MM often embeds allocations within each other and/or places allocation on the stack. Both of these options make manual MM far cheaper than GC, rather than slightly cheaper as shown in these papers.
There is nothing that prevents a GCed language to allocate things on the stack. Also I'm afraid using std::string in C++ causes more allocs/frees than typical use of Strings in Java/C#, because the former can't be safely shared and must be copied - on the heap.
A GCed language requires good stack escape analysis. Unless you expose it to the programmer for at least semi-manual MM, it is bound to have false negatives, where you pay for the allocation on the heap unnecessarily.
I totally agree about std::string.
I think C++ with the conventional libraries is actually an example of how to do manual MM badly.
Generally I agree that you don't want GC in all scenarios, but the paper you cite has several flaws which makes its conclusions very far from truth.
1. It is based on non-production JVM and does not use the same algorithms that are used in production level VMs. There is a huge performance difference between any GC from 1990s and modern generational GC used in production-level VMs.
2. Some part of the experiment involved simulation. You must be very careful with extrapolating simulation results onto real world. Actually someone on the Internet rerun one of the same experiment using non-simulated VM and got totally different results, but now I can't find that blog post :(
3. It compares GC to "ideal" manual memory management, ignoring the fact that manual memory management is not free either. Things like heap fragmentation or computational cost of running allocation/deallocation code do exist and may cause also some real problems. The costs lie elsewhere, but that doesn't mean they don't exist.
My experience with large applications is completely different than the conclusion of the paper. Unless you're doing something completely crazy like generating several GBs of garbage per second, modern generational GCs overhead is typically very small (<5%) with only 20-50% more memory than the live set size, not 3-5x as the paper claims.
The biggest pain not solved completely are pauses, but researchers are actively working on it and things are getting much better.
> 3. It compares GC to "ideal" manual memory management, ignoring the fact that manual memory management is not free either. Things like heap fragmentation or computational cost of running allocation/deallocation code do exist and may cause also some real problems. The costs lie elsewhere, but that doesn't mean they don't exist.
No, it doesn't; it uses malloc and free, counting the costs of allocation and deallocation using a traditional memory allocator. (In fact, if anything, that's unfair to traditional memory allocators, as high-performance code will often use things like bump-allocating arenas which significantly outperform malloc and free. It also uses dlmalloc, which is outpaced by jemalloc/tcmalloc these days, although if the benchmarks are not multithreaded the difference will be small.)
Heap fragmentation exists in GC systems as well, and fragmentation in modern allocators like jemalloc is very small.
Ok, point taken, however their cost analysis is based on "simulated cycles", which is extremely simplified. With modern CPUs doing caching, prefetching, out-of-order execution I seriously doubt its accurate. malloc/free have typically a tendency to scatter objects around the whole heap, while compacting GCs allocate from a contiguous region - so a properly designed research experiment would take that into account. Hans Boehm did experiments on real programs and found that using compacting GCs actually speeded up some programs because of better cache friendliness.
As for heap fragmentation - it does not exist in some GC systems, like G1 or C4. And fragmentation is also extremely workload dependent - it might be "very small" for most cases and for some might be as much as 5x (Firefox struggled a lot from this).
> , modern generational GCs overhead is typically very small (<5%) with only 20-50% more memory than the live set size, not 3-5x as the paper claims.
Another set of GC tests mentioned by the famous Drew Crawford article [1] said 4x to 6x was the sweet spot. A followup commenter wanted to clarify that the "best GC algorithms" worked within 2.5x. Whether its 2.5x or 4x, it's a counterpoint to the claims of only 50% more memory. Perhaps there are drastically different workloads skewing the tests. (I didn't thoroughly read both cited papers.)
This rant is about GCs in JS engines on mobile devices, which are nowhere near state-of-the art generational GCs used for JVM or CLR.
This is all very much dependent on the garbage production rate. If the application is priducing garbage like crazy, then it is possible to require even 2.5x more memory, but this is a rare edge case, just as 2.5x more memory required due to fragmentation. Any performance aware programmer will avoid dynamic allocation in tight loops, regardless of the type of memory management.
The web blog topic talked about Javascript GC but the cited paper about GC behavior was measuring desktop Java/JVM. The 2.5x to 6x memory sweet spot was talking about state-of-the-art JVM not Javascript. However, he's extrapolating that the difficulties unsolved by the best GC algorithms in Java to be the same technical challenges for GC research and progress in Javascript.
Those GC challenges point back to why Rust's focus on memory management via different types of pointers is valid. The best GC algorithms haven't solved many of the performance problems that Rust's approach can address.
This is the same paper I mentioned above, not "another" research paper. It is based on simulation, non-production JVM, doesn't use state of the art GC algorithms and rerunning the same benchmarks with Oracle JVM gives completely different results.
Also stating that GC CPU/memory overhead is X is generally incorrect because it depends on far too many things, like garbage production rate, eden space size, survivor rates, GC parallelism level etc.
I do have at least two coputational heavy applications running on JVM (one is doing lot of sparse LU-decompositions, another one is evolutionary optimization; both doing some allocations in the tight loop) and none of them requires 2x memory to run at top speed; they require much, much less. Everything depends on how it is coded. If you keep garbage production rates sane, and most objects die with first minor GC, the overhead of GC is extremely small, both CPU and memory-wise. I don't know what they did to get 6x overhead - this looks extremely fishy. Even memory heavy apps like databases don't reserve 6x more heap than the live set size.
Of course, there are some use cases, when GC definitely sucks (pauses, huge heaps) and therefore Rust is a nice thing to have. I hope some of those techniques get adapted in the future by GCed languages in order to reduce memory pressure.
The GC argument is a straw man. You can have automatic resource management without GC (actually, GC-languages like Java and C# are at odds with automatic resource management) and without manual ownership semantics a la C++ and Rust.