It’s not a full kernel memory CVE, you have +-255 bytes access to kernel memor from the cred pointer. I have no idea if that extends to userns or not. Also I think your confusing Java threads for system threads they are not the same.
I think your being overly alarmist. You have to trust someone else’s code at some point, otherwise you’ll be paralyzed by non-productivity.
> It’s not a full kernel memory CVE, you have +-255 bytes access to kernel memor from the cred pointer. I have no idea if that extends to userns or not.
As I understand it, a kuid_t is the UID in the root namespace, so setting your cred->uid to 0 gets you considered as equivalent to root in the container host.
Also, don't think that limited exposure to kernel memory saves you - take a look at the sudo "vudo" exploit from 2001, in which a single byte that was erroneously overwritten with 0, and then put back, turned out to be exploitable. http://phrack.org/issues/57/8.html (And in general, don't confuse the lack of public existence of an exploit with a proof that a thing isn't exploitable in a certain way.)
> Also I think your confusing Java threads for system threads they are not the same.
Current versions of the HotSpot JVM (where by "current" I mean "since about 1.1") create one OS thread per Java thread: http://openjdk.java.net/groups/hotspot/docs/RuntimeOverview.... "The basic threading model in Hotspot is a 1:1 mapping between Java threads (an instance of java.lang.Thread) and native operating system threads. The native thread is created when the Java thread is started, and is reclaimed once it terminates." Plus there are some other OS threads for the runtime itself.
> I think your being overly alarmist. You have to trust someone else’s code at some point, otherwise you’ll be paralyzed by non-productivity.
Sure, but you can choose which code to trust, and how to structure your systems to take advantage of the code you trust and not the code you don't. Putting mutually-distrusted things on physically separate Linux machines on the same network is a pretty good architecture: I trust that the Linux kernel is relatively low on CVEs that let TCP packets from a remote machine overwrite kernel memory.
255 bytes is huge, though IIUC it's actually less than that. Nonetheless, it's much more than is typical. Sometimes these holes are limited to a single word, and only a single value for that word (like NULL), and attackers still come up with marvelously devious exploits.
The critical vulnerability is that the cred pointer address is entirely under your control, so you get to poke at whatever kernel memory you want. The limitation is 1) locating the address of what you want to poke, and 2) being limited to a smallish ranges of values that you can write out.
Also, I'm not confusing Java threads with system threads. Most JVMs use a 1:1 threading model. And because on Linux a thread is just a process (which unfortunately still causes headaches with things like signals, setuid, etc), each thread has its own PID.
I'm not being alarmist, just realistic. Nobody is going to stop using Linux anytime soon. Nor am I. But the fact of the matter is that the Linux kernel is riddled with vulnerabilities. Something like the waitid vulnerability comes along at least 3 or 4 times a year, and that's just the published ones. (IMO, part of the reason is precisely because of complex features like user namespaces, which add tremendous complexity to the kernel. But that's a contentious point.)
At least for high-value assets (however you want to define that), people should just treat Linux as if it lacks secure process isolation entirely, absent a commitment to herculean efforts--extremely locked down seccomp, PNaCL-like sandboxing, etc for all your code that juggles tainted data. Even then, vulnerabilities like recvmmsg come along and ruin your day, but those are rare enough that it would be unfair to single-out Linux.
Not only is that pragmatic and reasonable, after 25 years of endless vulnerabilities of this sort I wouldn't trust the judgment of anyone who thought otherwise. And for what it's worth, I'd make much the same point about Windows, although I have much less experience on that platform.
Empirically, among the general purpose, commodity platforms OpenBSD has one of the best track records. Professionally I've had success placing OpenBSD in roles where Linux boxen were constantly hacked. But IT departments in particular, and application developers generally, dislike OpenBSD or would dislike it if it were forced upon them.
More importantly, while nowhere near as bad as Linux, macOS, or Windows, OpenBSD has at least one published severe local kernel vulnerability every year or two. In many cases those OpenBSD boxen I mentioned survived _despite_ being neglected by IT and not kept up-to-date; I know for a fact some were in a known, locally exploitable state for a not insignificant period of time. That makes me think a big part of their relative security is simply related to OpenBSD not being a common target of rootkits and worms. I have little doubt a sophisticated attacker could root an OpenBSD box from the shell, for example, if he was targeting that box. (My rule of thumb when isolating services is that anything running a web service using a non-trivial framework (PHP, NodeJS, etc) provides at least the equivalent to shell-level access to a targeted attacker. Among other things, that means even if I'm writing a privilege separated, locked-down, formally-verified backend daemon, I assume as a general rule that any data it's trying to protect isn't safe from that front-end web application unless it's running on separate hardware.)
While I don't think that security and convenience are necessarily mutually exclusive, as a practical matter they are largely mutually exclusive. Unless you're prepared to accept the burden and cost of using a specialized OS like seL4--and in particular use it in a way that preserves and leverages the stronger security guarantees--your best bet is simply to use separate servers when you want a significant degree of isolation assurance. Separate hardware is not sufficient (if all your boxes have Intel ME cards, or have firmware pushed from a puppet server, or share an engineer's account whose hacked desktop is logging SSH keys, passwords, and Yubikey PINs), but it's largely necessary. This is true whether you're concerned with targeted or opportunistic attacks, but _especially_ opportunistic attacks, which are by far the most common and, in many respects, an important element to targeted attacks.
Separate hardware is a uniquely simple, high-dividend solution. But the point is to be realistic about the actual robustness of your solutions, to be able to easily identify and mitigate risks, so you can make more informed security decisions. And it all depends on what you're protecting and what sort of investment you're capable of making. Just endeavor to predicate your decisions on accurate and informed assessments. Among other things, that means being honest about and understanding your own lack of knowledge and experience.
Similarly, continuity and long-term maintenance are critical to security, which means you need to be honest about institutional capabilities (or for a private server, what you're prepared to track and upgrade 3 years from now.)
Linux, OpenBSD, co-location, and cloud hosting can all be part of a perfectly robust strategy. And HSMs probably should be, too, which is basically just a way to attach a tiny, isolated, dedicated piece of hardware to a computer. But none of these options alone are likely to be reasonable choices, all things considered, especially in the organizational context.
I think your being overly alarmist. You have to trust someone else’s code at some point, otherwise you’ll be paralyzed by non-productivity.