Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenBSD System-Call Pinning (lwn.net)
129 points by rwmj on Feb 2, 2024 | hide | past | favorite | 64 comments


> The direct-syscalls-inside-the-binary model used by go (and only go, noone else in the history of the unix software does this) provided the biggest resistance against this effort".

I know this annoys unix people. But I have to say I actually really like that Go shakes this up. I believe the C function monopoly just isn't healthy for things. You should be able to make a new completely unrelated language. The Go developers were the first in a long time to do this, not because they are stupid, but because they were ambitious.


I do like this about go. And on Linux it arguably makes sense (I say arguable because DNS without CGO is still a common cause of issues and incompatibility).

But Linux has, to the best of my understanding said "yes, we are ok with users using syscalls". Linux doesn't think that glibc is the only project allowed to interface with the kernel.

But for other platforms like OpenBSD and windows they are quite simply relying on implementation details that the vendors consider to be a private and unsupported interface.

This whole thing is also separate from "is making libc the only caller of the syscalls instruction" a good and meaningful security improvement.


> I say arguable because DNS without CGO is still a common cause of issues and incompatibility

DNS without CGO works perfectly. The vendor specific ad hoc mechanisms for extending DNS in a site local context are not well supported. If they were implemented more sensibly, then Go, or any other language, would have no problem taking advantage of them even without the "C Library Resolver."

Speaking of which, that "C Library Resolver," in my opinion, has one of the worst library interfaces in all of unix. It's not at all a hill worth new projects dying on.


> The vendor specific ad hoc mechanisms for extending DNS in a site local context are not well supported

DNS is one of those things that OS vendors think should be extendable and configurable. It allows VPN apps to redirect DNS only for certain subdomains, for example, which enables proper split-horizon DNS. I think this is totally reasonable behavior, and it’s undeniably useful. If a particular programming language reimplements DNS on its own, you lose guarantees that the OS is striving to provide to the user.

You can make the case that OS’s shouldn’t make these guarantees, and we’re free to disagree on that, but from a practical standpoint it is a very useful feature and it sucks that pure Go apps don’t work with it.


The right way to do this is specify a resolver to use on localhost


Or you make the case that the OS should ship a DNS server. It can handle DNS forwarding and iterative queries anyway (iterative queries as you often need them for dnssec) and the logic for serving DNS responses is ok'ish (DNS name dedup and length restrictions being the biggest complexity issue).

I am not super happy with systemd-resolved but it solves this particular issues. No requirement to use libc, but same (os configurable) behavior for all users.


The difficulty and overhead of running those queries from everybody's phones is one of the reasons (a small reason, but one of them) DNSSEC isn't deployed.


> DNS without CGO works perfectly

It does not. I know this because it impacts my daily work and the work of others. Honestly if you could make my day and go figure out exactly what's going wrong with the pure go DNS implementation it would make my life alot simpler and I wouldn't have to maintain shell scripts that update etc/hosts to hard code in ipv4 addresses for the APIs I access with terraform.

https://github.com/hashicorp/terraform-provider-google/issue...


It seems like the explanation might be right there in that issue. The server is occasionally not successfully sending the A record and you're getting bad fallback behavior due to the way the dial call is imprecisely constructed and this is masking the underlying problem. The code, as written, is doing exactly what you would expect in this scenario.

Should it be your DNS resolver library that takes stock of your OS environment and only make calls for A records and not AAAA records when it "detects" some configuration?

Shouldn't your application itself have an environment variable or command line option that allows you to specify that your dials should only be done using tcp4? Wouldn't this be immensely useful to have outside of "auto detection" in some library somewhere?


Why should every application be aware of whether it is running on a v4 or v6 network? If the application merely wants to connect to an external service, that is firmly in the OS's job to decide.

As a user, if ipv6 is flaky today, I want one central place to configure for ipv4 only, I don't want to go and change every application'settings only to revert that tomorrow.


Oh.. right. So we need another configuration for each and every application for something almost always system wide and can change dynamically (like wifi reconnecting) ?


Most unixes allow for static binaries. And most unixes allow for you to run a static binary compiled for an older OS on a newer OS. In order for that to work, the syscall interface needs to be at least mostly stable.

Yes, if you're writing netstat or lsof or ps or something, you need tight coupling with the binary and the kernel, and you can argue Linux does that better, but most people aren't writing netstat or lsof or ps.


Even those get most of their info from /proc and /sys rather than special syscalls


On the BSDs they sure don't, /proc (for its role referred to) and /sys are peculiar to Linux, /proc is not even mounted by default on FreeBSD, is deprecated and doesn't exist on OpenBSD. On OpenBSD/FreeBSD most of the info comes out with sysctl(3) or reading kernel memory directly.


> reading kernel memory directly

God I hope not


https://man.freebsd.org/cgi/man.cgi?query=kvm&sektion=3&n=1

The aforementioned tools use these interfaces.

Emphasis on most of the sundry information for the live kernel now comes from sysctl, I note the (root only) mem/kmem interface for completeness and rare utilities (eg btsockstat) use it.

Going way back, this is how it all used to work, the more structured interfaces were a 90s thing. https://github.com/v7unix/v7unix/blob/master/v7/usr/src/cmd/... Even early Linux used kmem for ps. Not ideal https://cdn.kernel.org/pub/linux/kernel/Historic/old-version... Also why the package is still called procps, for a while it coexisted with kmem-ps.


I do not know what the previous poster was talking about, but, while unrestricted access to kernel memory is of course unacceptable, a perfectly secure and useful means for the kernel to provide information to a user process is to map a page as read-only in the user address space, from where the user process could get the information without the overhead of a system call.

For information of general interest such a special kernel page could be mapped as read-only in the address space of all user processes.

Much of the information that is provided in the special file systems /proc and /sys could have been provided in some appropriate data structures in such read-only shared memory, for a faster access, by avoiding the overhead of file system calls and of file text parsing.


Windows also provides language-independent ways of calling low-level (and pretty high-level) OS interfaces.

The library you have to link to access system services is not going to pollute your language environment with bad runtime.


Does Go use direct syscalls on Windows? Last time I looked, I could have sworn Go's source used the standard Windows system DLLs.


On Windows it never did, and on other platforms, other than Linux, it has been forced to accept that isn't the way OS vendors play the game.


I don't think syscall-vs-C-function is at all meaningful for someone making a completely unrelated language. The system has to pick some ABI for communicating with the kernel, and syscalls don't change that. The ISA generally doesn't define how arguments are passed, it's really just in the business of providing a fancy jump instruction.

Arguably, using the same ABI for userspace C functions and for communicating with the kernel reduces the amount of work required of completely new languages, because they are likely to need C interop support anyway.


If you're designing a whole new language today, you might want to build something to fit better with the new io_uring-like interfaces to the kernel and make 'old school' syscalls a fallback or special case. Would be interesting to have this (queueing, batching, completion) as a basic layer.


Yes, I'd like the OpenBSD approach a lot more if the specially privileged syscall library was a language-agnostic library just for syscalls instead of libc.

Also, if you're gonna add new ELF sections anyway, why not do syscall "relocation" directly (with a similar randomization like ASLR) instead of going through stubs? This "relocation" doesn't actually need to change any memory or use any offset tables since the syscall numbers are a farce under the new system anyway. Just make it a map of the location of syscall instructions to implied syscall numbers. Once you've phased out the old syscall model, you can even repurpose RAX as an additional syscall parameter.


> I'd like the OpenBSD approach a lot more if the specially privileged syscall library was a language-agnostic library just for syscalls instead of libc.

I think they should split that part of, too. Such a split would better reflect the split of responsibilities between kernel maintainers and libc maintainers.

At the same the effective difference for people using the code is minute.

Any decent linker will strip out unused functions in libraries you link with, so if you currently only use the part of libc that would become “libsyscall”, the end effect would be the same as when “libsyscall” existed.


My hot take here is that this split exists because of the way that Linux development works differently than other Unices and other operating systems. Essentially, it's an example of Conway's Law.

As always, things are about API boundaries: what is internal to you, and what is exposed externally to your users.

With Linux, well, we all know the rms copypasta: "I'd just like to interject for a moment. What you're referring to as Linux, is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux." This is a joke, but it also points to something real and serious: Linux, being just a kernel, means that they provide a stable kernel API. glibc, being written by a different organization, builds on top of that API and adds its own stable API.

By contrast, many other operating systems are developed as a full operating system. They don't produce a kernel as a standalone component. As such, they choose some other sort of boundary as the API for the OS. On unices, that's often the libc. On Windows, that's the Windows API, of which win32 is an example.

There are good reasons to not make the kernel your API boundary. Different systems make different choices, and that's a good thing, not a bad thing.


> There are good reasons to not make the kernel your API boundary.

Good reasons for the developers, not so much for the end users.


WinNT seems to do just fine with that. And it's only because of users insisting on using abandonware


Developers works on win32api, not the kernel side NT* whatever stuffs.


Yes, that's my point. App developers (actual users of the win32 ABI) seem just fine with not calling kernel directly. App users also enjoy best in world backcompat.


I don't agree that those two things are at odds.


This isn't about Go vs C. This is about about indirection of system calls through stubs and also in a way about static linking vs. dynamic linking. Something like this could be done for C and others, but the kernel would have to walk the C stack to ensure that a) the callers of the system calls are run-time library stubs listed in the ELF, and b) that the callers of the stubs are listed in the ELFs (plural). Clearly this is easier to do if there's neither stubs nor dynamic linking, but that's not a reason to think Go is somehow superior to the rest.


Anything statically linked to the C library will also have direct syscalls in the binary.


From comments:

> avoiding [...] errno

Note that errno can be optimized by a sufficiently smart compiler, simply by treating it as a register, then annotating various functions with whether it is preserved, clobbered, or conditionally used for return.

The libc boundary is quite annoying though; for this case in particular, that it doesn't expose the fact that errno is at a fixed offset from the TLS register. And generally, libc is vehemently opposed to the existence of smart compilers, since all libc calls are expected to be treated as opaque barriers.


well, yes, but you need whole-program compilation.


> And generally, libc is vehemently opposed to the existence of smart compilers

Yes, as far as I'm aware none of the major libcs on Linux support LTO


Library relinking, mimmutable(2), xonly, along with other recent developments such as the removal of indirect syscall(2), make syscall pinning all the more interesting, and raise the bar against attackers significantly.

One can only hope that more will eventually find its way into Linux, like with how paid employees at Google have been spending the past ~6 months cloning mimmutable (which HN characters decried as "useless") to make mseal() for defending Chrome.

https://marc.info/?l=linux-kernel&w=2&r=1&s=mseal&q=b


mimmutable, given a proper security model, doesn't seem useless. Lots of people have adopted something like this; Chrome is trying to bring it to Linux but Apple has been shipping a similar VM_FLAGS_PERMANENT for the last two years or so, which I believe even predates mimmutable making it to OpenBSD. In general, mitigations can raise the bar against attackers, but they don't have to. Determining whether it does can be pretty difficult, especially to those who come up with mitigations ;)


> Lots of people have adopted something like this;

Really? And how many systems have made any effort to use it, on OpenBSD most of a programs static address space is now automatically immutable (main program .text, ld.so .text, .bss, main stack, dymamically-loaded shared libraries, and dlopen()'d libraries mapped w/ RTLD_NODELETE.

Nobody else have done the work on a complete operating system.


Question for you: why do you do this? "We made most of the address space immutable" is, by itself, not a useful property security-wise. What analysis did you do to arrive at it being necessary? I mean this as a genuine question but pose it in the context of what everyone else is doing.

You're basically going "nobody else did this properly" because others did a different implementation. In other operating systems at least they go "oh we saw a chain that targeted xyz structure in this page and modified it so we are going to make sure it is really immutable". How did OpenBSD arrive at the conclusion that what other people are doing doesn't actually confer the full security benefit?


System call pinning has been a classic example of an OpenBSD approach to exploit mitigation that exploit developers dunk on; a thread just a couple months ago:

https://news.ycombinator.com/item?id=38579913

I think? this is the same work, just now merged into the kernel?


There is a paragraph in the article dedicated to this, starting with:

> Security researchers have expressed doubt about how useful this check is at preventing compromises.

Doesn't cite the HN thread, but two other cases.

> I think? this is the same work, just now merged into the kernel?

That is my understanding, yes. From the article:

> In December, De Raadt sent a patch to the OpenBSD mailing list expanding OpenBSD's restrictions on the locations from which a process can make system calls. ... Now that patch has been merged, finishing a process which De Raadt said has taken five years.


Has that "dunking" manifested in actual exploits?


I dunno. Has it?


Well since https://www.openbsd.org/ still says

> Only two remote holes in the default install, in a heck of a long time!

I'm assuming not, but I could always be mistaken.


This is a _very_ qualified statement. The default OpenBSD install enables an extremely small amount of services by default, which is why they can claim that. I'm not saying that's wrong, or a bad idea, but obviously a platform that doesn't enable many network services is going to have a small amount of remote holes.

this is on top of a lot of very careful programming and interesting security research, and this post isn't meant to take anything away from the OpenBSD devs.


> The default OpenBSD install enables an extremely small amount of services by default, which is why they can claim that.

Probably this issue has been hashed out many times over the decades, but arguably the security gain isn't a fortunate or incidental benefit of minimizing default enabled services, nor a cheat like weighted dice, it's a very real benefit resulting from an effective, intentional technique. Maybe other OSes should do the same, and then everyone would have that benefit.

The other OSes have other priorities, and that's fine. Embrace that. Yes, most users (and developers) don't want to deal with the compatibility issues. But when you say OpenBSD has few default security holes because they have few default services, that's a complement.


SEL4 is proven correct. Formally verified. No security holes in the default install, of any sort, ever. I mean, it doesn't do anything. But it has no security holes, with almost mathematical certainty.


As a huge believer in formal methods, this statement should _also_ be tempered somewhat. Formal proof is a great technique, but it's incredibly dependent on getting your specs right, which is very hard to do.

As an example, CompCERT is a formally verified C compiler, and it's had a couple bugs as a result of their specification of the underlying hardware being wrong.


I know, I know, I'm mostly being silly about SEL4. Also, obviously, OpenBSD does a lot more than SEL4. :)


Yeah, I get it, and obviously the question is where people want a balance. OpenBSD does more than nothing, less than SELinux-secured Linux, which does less than some other things.

Also, doesn't SEL4 have widespread, practical application? IIRC as the microkernal (maybe under Minix?) on the baseband hardware on cell phones? Maybe I'm confusing it with something else.


Sure, wasn't meant as a slight in any way. For certain use cases, that's a great set of defaults! It's very good to have an OS that makes those choices. Needing to explicitly opting into things that raise your exposed attack surface is really really nice!


> a classic example of an OpenBSD approach to exploit mitigation that exploit developers dunk on

Part of why it is hard to take its reputation as as security focused OS seriously.


In their defence, the OpenBSD developers do put their mon^H^H^Htime where their mouth is and release their work even with dubious features included. How much those features actually matter, instead of how much expert opinion says they matter, can then be judged in the wild.

Failed experiments are also a good part of research.


What can also be judged is ignoring expert advice in favor of rolling their own inferior solutions.


Whose and what expert advice did they ignore?


Mine (statistically at least; I am a random internet person, who does not contribute to any OS development)


Exploit developers telling them their mitigations won't do anything, for starters.


"Another benefit is that it requires unique methodology for OpenBSD, which requirements investment."

Secure in the absence of an attacker?

This part:

    xx:   b8 05 00 00 00          mov    $0x5,%eax
    xx:   0f 05                   syscall 

    This means "perform operation #5, which is open(2)"

    Inside the kernel, we know the system call # and the address of
    the syscall instruction.
Why can't the attacker just jump to an existing syscall instruction? Maybe one followed soon by a return.

  4) in libc.so's table, all the system call stub "syscall instructions"
     are registered.    
I don't know a lot about binary exploitation techniques. Is all this entirely reliant on layout randomization?


Yeah, but the memory shouldn't be readable, so it's harder to find the random location to jump to. There's a bit of circular discussion, where Theo/Open BSD devs say they're introducing feature F to reduce an exploit, and people criticize it as being irrelevant because you can do other exploits, but they've already implemented A,B,C,D and E to handle those, and for each of those individual features there's a list of exploits that get around them, but maybe not if all are implemented.

A more rigorous analysis of the security environment as a whole would be useful.


> Security researchers have expressed doubt about how useful this check is at preventing compromises.

Well... if they are able to craft a call to a syscall from some random piece of memory you are already f-d and this little hurdle is at most going to be an annoyance.


> Both [msyscall and pinsyscall] can only be invoked once by a given process, which is done by the dynamic linker.

How are/were static executables handled? I’m a little fuzzy on how execve notices that a given executable is dynamic (and needs ld.so to run it) or static (in which case.. it just jumps to the _start symbol?). If the dynamic linker isn’t involved, what calls msyscall/pinsyscall?


Ah, found a good discussion of how it works in Linux https://eli.thegreenplace.net/2012/08/13/how-statically-link...

Is it materially different in openbsd?


Doesn’t macOS do something like this.


It does not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: