Safe and Secure Drivers in High-Level Languages [video]

saagarjha · on Dec 30, 2018

Personally, I'd call this writing "Safer and More Secure Drivers in High-Level Languages", because there are still unsafe operations going on (for DMA, etc.): https://github.com/ixy-languages/ixy.swift/search?q=Unsafe

While it's great that you get improved safety (and often nicer and easier to reason about code) by using something other than C, you can still have memory safety issues from cavalier or incorrect usage of unsafe APIs, since they undermine the guarantees the language provides with regards to correctness.

Also, unrelated: does anyone actually have the slides (you know, the presentation file with text in it, rather than the mp4 that CCC is offering me) for this presentation? It's really annoying to scrub through a video to find stuff on a slow internet connection :(

pjmlp · on Dec 30, 2018

Quite true, however the unsafety code coverage gets reduced quite significantly.

On the last Linux Kernel Summit, according to Google, 68% of Linux kernel security exploits are caused by C memory corruption issues due to lack of bounds checking.

emmericp · on Dec 30, 2018

I've added them to the git repo: https://github.com/ixy-languages/ixy-languages/blob/master/s...

edwintorok · on Dec 30, 2018

Direct link for download (GitHub won't render the entire PDF by default): https://github.com/ixy-languages/ixy-languages/raw/master/sl...

saagarjha · on Dec 30, 2018

Thanks!

zozbot123 · on Dec 30, 2018

The talk mentions making DMA operations safe using IOMMU. (Unfortunately, IOMMU is a hardware feature and not always supported.)

voltagex_ · on Dec 30, 2018

The slides may not be up yet, but you could demux just the video stream out of https://mirrors.dotsrc.org/cdn.media.ccc.de//congress/2018/s..., or give me a yell in the next 30 minutes or so and I can do it. Don't have a nice script to de-dupe images though.

qznc · on Dec 30, 2018

Usually they would be linked on this page: https://fahrplan.events.ccc.de/congress/2018/Fahrplan/events...

duneroadrunner · on Dec 30, 2018

Another option is to use a memory safe subset of C++ [1]. It should be less work to migrate existing C drivers as (reasonable) C code maps directly into the safe C++ subset. And the migration can be done incrementally with corresponding incremental safety benefits.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus

pjmlp · on Dec 30, 2018

Only if the team plays along regarding static analysers and compiler warnings as errors.

newnewpdro · on Dec 30, 2018

Wouldn't you simply enforce this with automation if you were making a serious effort? It's already quite common for github PR's to require myriad CI tests to pass before anything can be merged... those can incorporate static analysis and warnings as errors.

pjmlp · on Dec 30, 2018

Still needs to have the buy-in from the team.

Try to be that clever guy putting such gates into place without having the team be on the same wave length.

Github is a bubble, there are tons of software projects out there, using a myriad of build infrastructures, or even just doing plain old IDE builds (yes I know, but it is as it is).

newnewpdro · on Dec 30, 2018

Convincing your team to switch languages is infinitely more difficult than adding infrastructure to enforce good hygiene. So I don't really see what your point is, it's moot.

pjmlp · on Dec 31, 2018

This whole thread started about enforcing behaviors that are largely ignored by enterprise developers outside HN bubble.

In no point there was a mention of switching languages.

newnewpdro · on Dec 31, 2018

Ah, this had left an lingering impression in my mind:

"Why do that, if there are languages that do it by design?"

But I see it wasn't your comment, my bad :)

shmerl · on Dec 30, 2018

Why do that, if there are languages that do it by design?

duneroadrunner · on Dec 30, 2018

Well like I said, if you already have a driver written in C (or C++), translating it to the safe subset of C++ would be less work as most of the code would remain unchanged and the unsafe elements (like pointers, arrays, etc.) map to direct (safe) replacements. And your driver maintainers/authors may already be familiar with C++ (if not big fans of it :) .

While the OP may demonstrate that other languages aren't always that bad in practice, I think the consensus is that Rust and C/C++ are the appropriate languages when maximum efficiency and minimum overhead are desired.

While Rust is a good option, the (safe subset of the) language has an intrinsic shortcoming that doesn't seem to be generally acknowledged. The forthcoming C++ "lifetime checker" (static analyzer) has the same issue [1]. Essentially, if you have a list (or any dynamic container) of references to a given set of existing objects, you cannot (temporarily) insert a reference to a (newly created) object that may not outlive the container itself.

In my view, this is a problem. The workarounds are intrusive, inconvenient and/or involve significant overhead. Which makes it tempting to just resort to unsafe Rust in those cases. (And of course this specific example is representative of a whole class of situations with a similar dilemma.) The safe C++ subset doesn't suffer the same issue. (Because in some sense, C++ is still a more "powerful" language.)

[1] https://github.com/duneroadrunner/misc/blob/master/201/8/Jul...

emmericp · on Dec 30, 2018

Code on GitHub: https://github.com/ixy-languages/ixy-languages/

Discussion about the C version on GitHub in 2017: https://news.ycombinator.com/item?id=16014307

guerby · on Dec 30, 2018

Great work! Do you have by any chance the C driver latency measurements? It would be nice to have them on the same graph as those for Rust and other languages.

emmericp · on Dec 30, 2018

Would look the same as rust; I haven't run the full measurement with all the packets. But sampling 1k packets/second yields the same result for C and Rust.

You can't really get a faster speed than Rust here; the only minor thing that could be improved is the worst-case latency by isolating the core that the driver is running on (isolcpus kernel option) to prevent other random interrupts or scheduling weirdness. But that optimization is the same for all languages and should get rid of the (tiny) long tail.

guerby · on Dec 30, 2018

Thanks!

mpweiher · on Dec 30, 2018

Great direction! One of the lesser known but really nifty bits of NeXTStep was DriverKit:

http://www.nextcomputers.org/NeXTfiles/Software/OPENSTEP/Dev...

Objective-C was a great match for driver development, devices tended to have a very natural OO flavour to them and naturally sorted into classes.

Putting things in user-space seemed a natural extension, that sadly didn't happen at the time even though Mach had the hooks for it (we never got user-level pagers either, which would have rocked together with garbage collected languages). There certainly didn't seem to be a good reason why I had to reboot the machine and wait for fsck when there was a minor bug in the little driver I was writing to talk to an EISA printer controller that had nothing to do with the basic functioning of the system...

(Why would a printer controller be on an EISA controller, you ask? It directly drove a Canon Laser Copier, so, yes!)

Oh, and not surprised by the abysmal Swift performance. Alas, Apple's marketing of Swift as a "high-performance" language has been very successful despite all the crushing evidence to the contrary.

saagarjha · on Dec 30, 2018

> Objective-C was a great match for driver development, devices tended to have a very natural OO flavour to them and naturally sorted into classes.

How do you feel about the current state of driver development on macOS, with Objective-C basically being replaced with Embedded C++ with partial reflection?

mpweiher · on Dec 30, 2018

It was part appeasement of the "never Objective-C" crowd (hello CoreFoundation, hello CocoaJava, hello Swift) and part the exact sentiment discussed here, that you cannot possibly do kernel development in a higher level language.

What I heard (quite some time ago) is that this move is now seen as a mistake.

mrpippy · on Dec 30, 2018

ObjC was definitely seen as a dead-end at the time: it would either be replaced with Java, or maybe Mac developers would just stick with Carbon and C/C++. Either way, all driver development (on classic Mac, Windows, Unix) was in C, and C++ would be much more familiar than the “weird obsolete square-brackets NeXT language”

I made a post a few years ago discussing the issue:

https://news.ycombinator.com/item?id=10006411

pjmlp · on Dec 30, 2018

As addition to the talk,

Android Things userspace drivers are done in Java, and since Treble, it is also possible to write Android drivers in Java.

MicroEJ and Windows IoT Core also allow for such capabilities.

Loved the talk.

henrikeh · on Dec 30, 2018

It is disappointing that Ada seems to be completely shunned from all discussions of safe high-level languages despite having both a track record (several successful safety critical systems) and a unique feature set (multiple compilers, provably correct subset, ranged subtypes).

Is it purely a matter of what is well-branded?

agumonkey · on Dec 30, 2018

Ada also suffered from bad roots and bad timing. We're off a decade of simple dynamic languages. Ada seems like an old and immense ruin. Maybe there will be a julia/rust equivalent for Ada.

henrikeh · on Dec 30, 2018

And it is very true -- Ada never truly gained any foothold in anything but situations where it truly delivered upon a requirement.

If anything, I can take solace in the fact that it is very much alive despite the exaggerated rumors of its death. I'd encourage everyone to try it out and steal all the ideas.

zozbot123 · on Dec 30, 2018

"Multiple compilers" will do little good if the compilers aren't generally available. The Ada problem is all about availability of good implementations. (And no, GCC doesn't cut it. Not in 2018 at least.)

henrikeh · on Dec 30, 2018

Paying for a compiler isn't really as crazy as it sounds. Companies pay for CI, static analyzers etc. etc. For non-commercial applications it is obviously different.*

Yet, I'm genuinely curious, why isn't GCC up to snuff?

* Edit: Which of course limits general adoption.

pjmlp · on Dec 31, 2018

Somehow it feels strange how many devs expect to be paid for their work, while blantly refusing to pay for the work of others.

Santosh83 · on Dec 30, 2018

> Our drivers in Rust, C#, go, and Swift are completely finished, tuned for performance, evaluated, and benchmarked. And all of them except for Swift are about 80-90% as fast as our user space C driver and 6-10 times faster than the kernel C driver.

Interesting. What is the reason for higher performance of user space C driver (and the other user space drivers for that matter) when compared to the kernel C driver? Will this hold for all driver types or is this a rather uncommon property of this particular kind of driver?

usaphp · on Dec 30, 2018

> A main driver of performance for network drivers is sending/receiving packets in batches from/to the NIC. Ixy can already achieve a high performance with relatively low batch sizes of 32-64 because it is a full user space driver. Other user space packet processing frameworks like netmap that rely on a kernel driver need larger batch sizes of 512 and above to amortize the larger overhead of communicating with the driver in the kernel.

mpweiher · on Dec 30, 2018

Interesting!

As our devices have gotten faster, the user-space/kernel boundary is becoming more and more of an issue. I was shocked when my supposedly super-fast MacBook Pro SSD (2+GB/s) was only giving me around 250MB/s.

Turned out mkfile(8), which I was using without thinking much about it, is only using 512 byte buffers...

https://blog.metaobject.com/2017/02/mkfile8-is-severely-sysc...

zozbot123 · on Dec 30, 2018

The main thing affecting performance at the userspace/kernel boundary these days is Spectre and Meltdown mitigations, FWIW...

vlovich123 · on Dec 30, 2018

No, userspace/kernel transitions are always and will always be slow. Every time it happens you've got to do a context switch which is super-expensive + cache unfriendly. You also pay a penalty keeping the kernel mapped at all times in terms of more pressure on the TLB but due to Spectre and Meltdown mitigations the kernel has actually been unmapped hurting the performance of switching into kernel further although this will be undone eventually.

mpweiher · on Dec 30, 2018

Yeah, when I ran the same tests a little later and with the Spectre/Meltdown patches (+APFS, which also didn't help), the mkfile I/O rate had a further precipitous drop to ~80MB/s.

emmericp · on Dec 30, 2018

Check out my talk last year, that should answer that: https://media.ccc.de/v/34c3-9159-demystifying_network_cards

emmericp · on Dec 30, 2018

Previous discussions about our Rust and Go drivers:

https://news.ycombinator.com/item?id=18405515

https://news.ycombinator.com/item?id=18399389

heyjudy · on Jan 1, 2019

What would make device drivers safer:

- Microkernel OSes - one driver failing is okay, because it runs in mostly in user-space. The big gotcha in microkernel design is transactions that touch multiple components. Some sort of across-driver transaction API is needed (start, commit, rollback) in order to undo changes across several userspace subsystems.

- A standard language (like the talk suggests) and shipped as portable bytecode to run on a VM or compile to native, so that drivers are portable and runnable without knowing the architecture.

- Devices themselves containing OS-signed drivers rather than each OS having a kitchen-sink installation of all drivers. Each bus would have an interrogation call to fetch the driver.

fulafel · on Dec 30, 2018

Is there a plan for letting applications using the normal networking APIs see this, or is it currently just for "process raw ethernet packets in userspace" kind of apps?

(The latter is a great thing to have built, of course, just thinking aloud about how this might replace existing drivers)

emmericp · on Dec 30, 2018

Yeah, I'd love to port a Go TCP stack like https://github.com/google/netstack to use our driver to build a microkernel-style service offering network connectivity. Running taps (https://datatracker.ietf.org/wg/taps/about/) on top of that would be ideal for a modern setup. But that's a lot of work...

pjmlp · on Dec 30, 2018

Swift loosing heavily due to reference counting GC instead of a tracing GC.

Now I have a talk to point out to those that always trump that referencing counting GC are so much better than tracing ones.

tom_mellior · on Dec 30, 2018

Their notion of "better" may differ from yours. Do you actually know people who claim that reference counting is faster than a marking GC?

Reference counting can be better in terms of ease of implementation, cross-language interoperability, and by reclaiming memory immediately when the last reference to it disappears.

pjmlp · on Dec 30, 2018

Reference counting is only better in terms of ease of implementation.

Hence why it is usually one of the earliest chapters in any CS book about GC algorithms.

Reclaiming memory immediatly only works in simple data structures. Naive reference counting impletations have similar stop-the-world effects when releasing relatively big data structures, which can even lead to stack overflows, if the destructor calls happen to be nested.

In case of Objective-C and Swift, Objective-C tracing GC project failed due to the underlying C semantics, thus they took the 2nd best approach of enforcing Cocoa retain/release patterns via the compiler, which only applies to a small set of Objective-C data types.

Swift naturally had to support the same memory management model, as means to keep compatibility with the Objective-C runtime.

vlovich123 · on Dec 30, 2018

Eh. In practice reference counting solves 90% of the problem and keeps memory usage low at all times. It's part of why Java programs are so hard to tune for performance & end up eating so much RAM. If you don't believe, compare Android & iOS where even though Android has enormous financial & competitive pressure to improve the performance they still end up requiring 2x the amount of RAM that iOS does which is partially driven by the choice of Java.

pjmlp · on Dec 30, 2018

People keep referring to Java to talk bad about tracing GCs.

The fact is that Java isn't the only game in town, and all GC enabled system programming languages do offer multiple ways to manage memory.

Value types, traced GC memory references, global memory allocation, stack values, RAII, untraced memory references in unsafe code blocks.

Not every OOP language is Java, nor every GC is Java's GC.

Additionally, not every Java GC is like OpenJDK GC's, there are plenty to chose from, including soft real time ones for embedded deployments.

As for Android, it is a fork still catching up with what Java toolchains like PTC/Aonix are capable of, all because Google decided it could do better while screwing Sun in the process.

zozbot123 · on Dec 31, 2018

> and all GC enabled system programming languages do offer multiple ways to manage memory.

Since "GC-enabled system programming languages" is an oxymoron, a claim about what such languages may or may not include is just not very useful. But it's definitely the case that properly combining, e.g. "traced GC memory references" and RAII including deterministic deallocation for resources is still a matter of ongoing research, e.g. https://arxiv.org/abs/1803.02796 That may or may not pan out in the future, as may other things such as pluggable, lightweight GC for a subset of memory objects, etc., but let's stop putting lipstick on the pig that is obligate tracing GC.

pjmlp · on Dec 31, 2018

An oxymoron only in the minds of anti-tracing GC hate crowd.

- Mesa/Cedar at Xerox PARC

- Algol 68 at UK Navy computing center

- Modula-2+ at Olivetti DEC

- Modula-3 at Olivetti DEC/Compaq/HP and University of Washington

- Oberon, Oberon-2, Active Oberon, Oberon-07 at ETHZ

- Oberon-07 at Astrobe

- Component Pascal at Oberon microsystems AG

- Sing#, Dafny and System C# (M#) at Microsoft Research

- Java when running AOT compiled on bare metal embedded systems like PTC Perc and Aicas Jamaica

- D by Digital Mars

- Go at Google (Fuchsia) and MIT (Biscuit)

Lets stop pretending reference counting is the best of all GC algorithms, in spite the fact that is quite basic and does not scale in modern multi-core NUMA architectures.

vlovich123 · on Jan 6, 2019

No one is saying that reference counting is the best. What I am saying is that reference counting tends to offer a good set of advantages (predictable memory performance, no hogging of memory, no pauses) for a minimal cost (more frequent GC, more overhead to store reference counts).

The comment about "does not scale in multi-core NUMA" only applies if you have objects that are shared between threads because otherwise there's no atomics going on. For example, Rust has a generic ref-count mechanism that automatically uses atomic operations for ref-counts when an object might be shared between threads but otherwise does simple arithmetic. Non-atomic refcounts are most likely also going to be faster than any other global GC algorithm. Other languages require explicit differences but are still able to offer the same thing.

The fact of the matter is that the majority of objects do not require expensive GC of any kind & can live on the stack or have explicit ownership guarantees. Choosing defaults where everything might be shared is not a good default for systems languages as it pessimizes memory usage & CPU performance to a drastic degree.

That being said, GC does have its place in all manner of applications and has other advantages like making developers more productive which isn't a bad thing but these are domain-specific decisions. There are plenty of techniques - reference counting, memory pools, static memory allocation, various GC algorithms, etc, etc. Each has tradeoffs & every single GC system I've encountered means variable latency/stop-the-world and greedy memory usage (optimized for the 1 application). That's valid in some domains but certainly isn't desirable. If there were an awesome GC system like you claim that could perform that well it would have been deployed already to inemurable applications like all Java vendors, Javascript VMs, C#, etc, etc. It's an extremely complex problem.

Most of your links are niche commercial systems or even pure academic research systems. They're not proof of anything other than GC being possible to implement for various languages/machines which isn't a claim that's been disputed at all.

> Go at Google (Fuchsia)

AFAIK Fuchsia does not use Go for any systems-level portions. Those are written in C/C++/Rust last time I checked (with Rust being the official default going forward). Do you have any links to the contrary?

zozbot123 · on Dec 30, 2018

If you don't need to "reclaim memory immediately", you can often use arenas, a.k.a. regions - freeing an arena does not incur a "stop the world" pause. Similarly, most "big data structures" have little use for reference counting in their internals (albeit concurrent data structures may indeed use refcounting internally, and more obviously it comes up when implementing general graph structures). Overall, outside of the use of obligate GC as in Swift, I suspect that nested destructor calls are unlikely to be a significant problem in practice.

pjmlp · on Dec 30, 2018

Regions are a feature also available in languages with tracing GC, while offering better productivity in overall.

MaxBarraclough · on Jan 5, 2019

> Reference counting is only better in terms of ease of implementation.

That sounds unduly broad. You really think the Linux kernel would be better off using a GC?

zozbot123 · on Dec 30, 2018

Rust also uses refcounting, not tracing GC. The problem with Swift is its use of obligate GC for things that don't need it in other languages (C/C++, Rust).

pjmlp · on Dec 30, 2018

Rust doesn't do refcounting, it makes use of an affine type system, which is something completly different.

If you make use of the refcounting library types in Rust, from the std::rc and std::sync crates, the performace impact would be quite similar.

zozbot123 · on Dec 30, 2018

Sure, but you don't need refcounting for cases that are covered by Rust's affine types, even in C/C++. You can use the patterns described in the C++ Core Guidelines, and end up with something quite rusty, only without fully-automated checking.

(Besides, I think std::rc has better performance than the refcounts found in Swift and C++, because it's used in cases that don't need atomic update, and yes this is statically checked too.)

pjmlp · on Dec 30, 2018

> You can use the patterns described in the C++ Core Guidelines, and end up with something quite rusty, only without fully-automated checking.

Also known as the mythical "only other programmers commit errors, I am always correct".

asdkhadsj · on Dec 30, 2018

Yea, I'm not sure what point was being made there. Rust isn't about being faster than C, it's about not letting you make mistakes you can make in C/etc.

If you could somehow write perfect code in a timely manner, you'd have no need for Rust. You'd likely also be a unicorn.

steveklabnik · on Dec 30, 2018

You also only need to bump the counts on Rc<T>/Arc<T> for each owner, not for every reference, which reduces refcount traffic.

mmirate3 · on Dec 30, 2018

> only without fully-automated checking

The fully-automated checking is precisely what allows one to write unmanaged code without tearing one's hair out.