While it's great that you get improved safety (and often nicer and easier to reason about code) by using something other than C, you can still have memory safety issues from cavalier or incorrect usage of unsafe APIs, since they undermine the guarantees the language provides with regards to correctness.
Also, unrelated: does anyone actually have the slides (you know, the presentation file with text in it, rather than the mp4 that CCC is offering me) for this presentation? It's really annoying to scrub through a video to find stuff on a slow internet connection :(
On the last Linux Kernel Summit, according to Google, 68% of Linux kernel security exploits are caused by C memory corruption issues due to lack of bounds checking.
The slides may not be up yet, but you could demux just the video stream out of https://mirrors.dotsrc.org/cdn.media.ccc.de//congress/2018/s..., or give me a yell in the next 30 minutes or so and I can do it. Don't have a nice script to de-dupe images though.
Another option is to use a memory safe subset of C++ [1]. It should be less work to migrate existing C drivers as (reasonable) C code maps directly into the safe C++ subset. And the migration can be done incrementally with corresponding incremental safety benefits.
Wouldn't you simply enforce this with automation if you were making a serious effort? It's already quite common for github PR's to require myriad CI tests to pass before anything can be merged... those can incorporate static analysis and warnings as errors.
Try to be that clever guy putting such gates into place without having the team be on the same wave length.
Github is a bubble, there are tons of software projects out there, using a myriad of build infrastructures, or even just doing plain old IDE builds (yes I know, but it is as it is).
Convincing your team to switch languages is infinitely more difficult than adding infrastructure to enforce good hygiene. So I don't really see what your point is, it's moot.
Well like I said, if you already have a driver written in C (or C++), translating it to the safe subset of C++ would be less work as most of the code would remain unchanged and the unsafe elements (like pointers, arrays, etc.) map to direct (safe) replacements. And your driver maintainers/authors may already be familiar with C++ (if not big fans of it :) .
While the OP may demonstrate that other languages aren't always that bad in practice, I think the consensus is that Rust and C/C++ are the appropriate languages when maximum efficiency and minimum overhead are desired.
While Rust is a good option, the (safe subset of the) language has an intrinsic shortcoming that doesn't seem to be generally acknowledged. The forthcoming C++ "lifetime checker" (static analyzer) has the same issue [1]. Essentially, if you have a list (or any dynamic container) of references to a given set of existing objects, you cannot (temporarily) insert a reference to a (newly created) object that may not outlive the container itself.
In my view, this is a problem. The workarounds are intrusive, inconvenient and/or involve significant overhead. Which makes it tempting to just resort to unsafe Rust in those cases. (And of course this specific example is representative of a whole class of situations with a similar dilemma.) The safe C++ subset doesn't suffer the same issue. (Because in some sense, C++ is still a more "powerful" language.)
Great work! Do you have by any chance the C driver latency measurements? It would be nice to have them on the same graph as those for Rust and other languages.
Would look the same as rust; I haven't run the full measurement with all the packets. But sampling 1k packets/second yields the same result for C and Rust.
You can't really get a faster speed than Rust here; the only minor thing that could be improved is the worst-case latency by isolating the core that the driver is running on (isolcpus kernel option) to prevent other random interrupts or scheduling weirdness. But that optimization is the same for all languages and should get rid of the (tiny) long tail.
Objective-C was a great match for driver development, devices tended to have a very natural OO flavour to them and naturally sorted into classes.
Putting things in user-space seemed a natural extension, that sadly didn't happen at the time even though Mach had the hooks for it (we never got user-level pagers either, which would have rocked together with garbage collected languages). There certainly didn't seem to be a good reason why I had to reboot the machine and wait for fsck when there was a minor bug in the little driver I was writing to talk to an EISA printer controller that had nothing to do with the basic functioning of the system...
(Why would a printer controller be on an EISA controller, you ask? It directly drove a Canon Laser Copier, so, yes!)
Oh, and not surprised by the abysmal Swift performance. Alas, Apple's marketing of Swift as a "high-performance" language has been very successful despite all the crushing evidence to the contrary.
> Objective-C was a great match for driver development, devices tended to have a very natural OO flavour to them and naturally sorted into classes.
How do you feel about the current state of driver development on macOS, with Objective-C basically being replaced with Embedded C++ with partial reflection?
It was part appeasement of the "never Objective-C" crowd (hello CoreFoundation, hello CocoaJava, hello Swift) and part the exact sentiment discussed here, that you cannot possibly do kernel development in a higher level language.
What I heard (quite some time ago) is that this move is now seen as a mistake.
ObjC was definitely seen as a dead-end at the time: it would either be replaced with Java, or maybe Mac developers would just stick with Carbon and C/C++. Either way, all driver development (on classic Mac, Windows, Unix) was in C, and C++ would be much more familiar than the “weird obsolete square-brackets NeXT language”
I made a post a few years ago discussing the issue:
It is disappointing that Ada seems to be completely shunned from all discussions of safe high-level languages despite having both a track record (several successful safety critical systems) and a unique feature set (multiple compilers, provably correct subset, ranged subtypes).
Ada also suffered from bad roots and bad timing. We're off a decade of simple dynamic languages. Ada seems like an old and immense ruin. Maybe there will be a julia/rust equivalent for Ada.
And it is very true -- Ada never truly gained any foothold in anything but situations where it truly delivered upon a requirement.
If anything, I can take solace in the fact that it is very much alive despite the exaggerated rumors of its death. I'd encourage everyone to try it out and steal all the ideas.
"Multiple compilers" will do little good if the compilers aren't generally available. The Ada problem is all about availability of good implementations. (And no, GCC doesn't cut it. Not in 2018 at least.)
Paying for a compiler isn't really as crazy as it sounds. Companies pay for CI, static analyzers etc. etc. For non-commercial applications it is obviously different.*
Yet, I'm genuinely curious, why isn't GCC up to snuff?
> Our drivers in Rust, C#, go, and Swift are completely finished, tuned for performance, evaluated, and benchmarked. And all of them except for Swift are about 80-90% as fast as our user space C driver and 6-10 times faster than the kernel C driver.
Interesting. What is the reason for higher performance of user space C driver (and the other user space drivers for that matter) when compared to the kernel C driver? Will this hold for all driver types or is this a rather uncommon property of this particular kind of driver?
> A main driver of performance for network drivers is sending/receiving packets in batches from/to the NIC. Ixy can already achieve a high performance with relatively low batch sizes of 32-64 because it is a full user space driver. Other user space packet processing frameworks like netmap that rely on a kernel driver need larger batch sizes of 512 and above to amortize the larger overhead of communicating with the driver in the kernel.
As our devices have gotten faster, the user-space/kernel boundary is becoming more and more of an issue. I was shocked when my supposedly super-fast MacBook Pro SSD (2+GB/s) was only giving me around 250MB/s.
Turned out mkfile(8), which I was using without thinking much about it, is only using 512 byte buffers...
No, userspace/kernel transitions are always and will always be slow. Every time it happens you've got to do a context switch which is super-expensive + cache unfriendly. You also pay a penalty keeping the kernel mapped at all times in terms of more pressure on the TLB but due to Spectre and Meltdown mitigations the kernel has actually been unmapped hurting the performance of switching into kernel further although this will be undone eventually.
Yeah, when I ran the same tests a little later and with the Spectre/Meltdown patches (+APFS, which also didn't help), the mkfile I/O rate had a further precipitous drop to ~80MB/s.
- Microkernel OSes - one driver failing is okay, because it runs in mostly in user-space. The big gotcha in microkernel design is transactions that touch multiple components. Some sort of across-driver transaction API is needed (start, commit, rollback) in order to undo changes across several userspace subsystems.
- A standard language (like the talk suggests) and shipped as portable bytecode to run on a VM or compile to native, so that drivers are portable and runnable without knowing the architecture.
- Devices themselves containing OS-signed drivers rather than each OS having a kitchen-sink installation of all drivers. Each bus would have an interrogation call to fetch the driver.
Is there a plan for letting applications using the normal networking APIs see this, or is it currently just for "process raw ethernet packets in userspace" kind of apps?
(The latter is a great thing to have built, of course, just thinking aloud about how this might replace existing drivers)
Their notion of "better" may differ from yours. Do you actually know people who claim that reference counting is faster than a marking GC?
Reference counting can be better in terms of ease of implementation, cross-language interoperability, and by reclaiming memory immediately when the last reference to it disappears.
Reference counting is only better in terms of ease of implementation.
Hence why it is usually one of the earliest chapters in any CS book about GC algorithms.
Reclaiming memory immediatly only works in simple data structures. Naive reference counting impletations have similar stop-the-world effects when releasing relatively big data structures, which can even lead to stack overflows, if the destructor calls happen to be nested.
In case of Objective-C and Swift, Objective-C tracing GC project failed due to the underlying C semantics, thus they took the 2nd best approach of enforcing Cocoa retain/release patterns via the compiler, which only applies to a small set of Objective-C data types.
Swift naturally had to support the same memory management model, as means to keep compatibility with the Objective-C runtime.
Eh. In practice reference counting solves 90% of the problem and keeps memory usage low at all times. It's part of why Java programs are so hard to tune for performance & end up eating so much RAM. If you don't believe, compare Android & iOS where even though Android has enormous financial & competitive pressure to improve the performance they still end up requiring 2x the amount of RAM that iOS does which is partially driven by the choice of Java.
People keep referring to Java to talk bad about tracing GCs.
The fact is that Java isn't the only game in town, and all GC enabled system programming languages do offer multiple ways to manage memory.
Value types, traced GC memory references, global memory allocation, stack values, RAII, untraced memory references in unsafe code blocks.
Not every OOP language is Java, nor every GC is Java's GC.
Additionally, not every Java GC is like OpenJDK GC's, there are plenty to chose from, including soft real time ones for embedded deployments.
As for Android, it is a fork still catching up with what Java toolchains like PTC/Aonix are capable of, all because Google decided it could do better while screwing Sun in the process.
> and all GC enabled system programming languages do offer multiple ways to manage memory.
Since "GC-enabled system programming languages" is an oxymoron, a claim about what such languages may or may not include is just not very useful. But it's definitely the case that properly combining, e.g. "traced GC memory references" and RAII including deterministic deallocation for resources is still a matter of ongoing research, e.g. https://arxiv.org/abs/1803.02796 That may or may not pan out in the future, as may other things such as pluggable, lightweight GC for a subset of memory objects, etc., but let's stop putting lipstick on the pig that is obligate tracing GC.
An oxymoron only in the minds of anti-tracing GC hate crowd.
- Mesa/Cedar at Xerox PARC
- Algol 68 at UK Navy computing center
- Modula-2+ at Olivetti DEC
- Modula-3 at Olivetti DEC/Compaq/HP and University of Washington
- Oberon, Oberon-2, Active Oberon, Oberon-07 at ETHZ
- Oberon-07 at Astrobe
- Component Pascal at Oberon microsystems AG
- Sing#, Dafny and System C# (M#) at Microsoft Research
- Java when running AOT compiled on bare metal embedded systems like PTC Perc and Aicas Jamaica
- D by Digital Mars
- Go at Google (Fuchsia) and MIT (Biscuit)
Lets stop pretending reference counting is the best of all GC algorithms, in spite the fact that is quite basic and does not scale in modern multi-core NUMA architectures.
No one is saying that reference counting is the best. What I am saying is that reference counting tends to offer a good set of advantages (predictable memory performance, no hogging of memory, no pauses) for a minimal cost (more frequent GC, more overhead to store reference counts).
The comment about "does not scale in multi-core NUMA" only applies if you have objects that are shared between threads because otherwise there's no atomics going on. For example, Rust has a generic ref-count mechanism that automatically uses atomic operations for ref-counts when an object might be shared between threads but otherwise does simple arithmetic. Non-atomic refcounts are most likely also going to be faster than any other global GC algorithm. Other languages require explicit differences but are still able to offer the same thing.
The fact of the matter is that the majority of objects do not require expensive GC of any kind & can live on the stack or have explicit ownership guarantees. Choosing defaults where everything might be shared is not a good default for systems languages as it pessimizes memory usage & CPU performance to a drastic degree.
That being said, GC does have its place in all manner of applications and has other advantages like making developers more productive which isn't a bad thing but these are domain-specific decisions. There are plenty of techniques - reference counting, memory pools, static memory allocation, various GC algorithms, etc, etc. Each has tradeoffs & every single GC system I've encountered means variable latency/stop-the-world and greedy memory usage (optimized for the 1 application). That's valid in some domains but certainly isn't desirable. If there were an awesome GC system like you claim that could perform that well it would have been deployed already to inemurable applications like all Java vendors, Javascript VMs, C#, etc, etc. It's an extremely complex problem.
Most of your links are niche commercial systems or even pure academic research systems. They're not proof of anything other than GC being possible to implement for various languages/machines which isn't a claim that's been disputed at all.
> Go at Google (Fuchsia)
AFAIK Fuchsia does not use Go for any systems-level portions. Those are written in C/C++/Rust last time I checked (with Rust being the official default going forward). Do you have any links to the contrary?
If you don't need to "reclaim memory immediately", you can often use arenas, a.k.a. regions - freeing an arena does not incur a "stop the world" pause. Similarly, most "big data structures" have little use for reference counting in their internals (albeit concurrent data structures may indeed use refcounting internally, and more obviously it comes up when implementing general graph structures). Overall, outside of the use of obligate GC as in Swift, I suspect that nested destructor calls are unlikely to be a significant problem in practice.
Rust also uses refcounting, not tracing GC. The problem with Swift is its use of obligate GC for things that don't need it in other languages (C/C++, Rust).
Sure, but you don't need refcounting for cases that are covered by Rust's affine types, even in C/C++. You can use the patterns described in the C++ Core Guidelines, and end up with something quite rusty, only without fully-automated checking.
(Besides, I think std::rc has better performance than the refcounts found in Swift and C++, because it's used in cases that don't need atomic update, and yes this is statically checked too.)
Yea, I'm not sure what point was being made there. Rust isn't about being faster than C, it's about not letting you make mistakes you can make in C/etc.
If you could somehow write perfect code in a timely manner, you'd have no need for Rust. You'd likely also be a unicorn.
While it's great that you get improved safety (and often nicer and easier to reason about code) by using something other than C, you can still have memory safety issues from cavalier or incorrect usage of unsafe APIs, since they undermine the guarantees the language provides with regards to correctness.
Also, unrelated: does anyone actually have the slides (you know, the presentation file with text in it, rather than the mp4 that CCC is offering me) for this presentation? It's really annoying to scrub through a video to find stuff on a slow internet connection :(