Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenBSD switches the default compiler on amd64 and i386 to clang (marc.info)
195 points by cnst on July 27, 2017 | hide | past | favorite | 97 comments


Now that OpenBSD has an optimizing compiler maybe they will notice and fix dataraces in their pthread_mutex_lock implementation. Currently they use broken double-checked locking pattern when mutex is lazy initialized with PTHREAD_MUTEX_INITIALIZER.


In case someone is interested, here is relevant code [0]. Notice that mutexp is read without any synchronization, and _spinlock further down uses different lock than one in initialization code (former is specific to this mutex and latter a global one). There is no happens before relationship between mutex initialization and mutex locking. Thus, if different thread reads non-null value of *mutexp, there is no guarantee that writes done to the mutext structure itself will be visible. Given strong memory model of amd64 and i386, compiler would have to reorder pointer publication and structure initialization in pthread_mutex_init, unlikely but certainly a valid change given lack of synchronization.

[0] https://github.com/openbsd/src/blob/0ecb71b5ec9e4b22a484606f...


Correct me if I'm wrong, but I think a compiler that doesn't do link-time-optimization has to assume that the _spinlock() function can write to mutexp (e.g. through a global variable), and therefore it cannot reorder anything around it.


I think you are looking at the side that does the reading. I was actually thinking about the other side that does the writing (inside pthread_mutex_init), where those two things are not separated by any function calls and are more readily reordered. Not to mention that it would be problematic on architectures with relaxed memory model either way.


Have you reported this bug upstream ?


I just committed a potential fix[0]. Thank you for bringing this to our attention and please let us know if that takes care of the defect you were seeing!

Next time, if you want to be sure we read about your problems with OpenBSD, please use sendbug(8) and fill a report[1] or come talk to us on tech@.

[0] -- https://marc.info/?l=openbsd-cvs&m=150131738906455&w=2

[1] -- https://www.openbsd.org/report.html


They used GCC before Clang, so they switched to a less optimizing compiler.


I would argue to the contrary, especially that we are comparing modern Clang/LLVM with a GCC that was released back in 2008. At that point C programming language didn't even have a memory model.


Their GCC version was forked a while back because of quality issues in "stable" releases of GCC. Optimizing compilers have gotten a lot better since then. Upstream Clang still (in aggregate) generates worse code than upstream GCC, but the point is that Clang's stable releases tend to have fewer correctness regressions, and optimize more than OpenBSD's current version of GCC.


The commit sounds like the people using OpenBSD for these processors will have two compilers installed.

I am not a programmer but I am curious: how does one select GCC as a compiler if the default is Clang? Just change what CC links to?


Make and most other build systems respect the CC and CXX variables.

Switching compilers is a very common thing among systems developers.. bootstrapping OS builds, crosscompiling for a different arch, getting compiler feedback from another compiler, etc


For projects that use configuration frameworks it's often just setting an environment variable like CC or CXX to point to the compiler(s) you want and they'll use those particular binaries when building.

If there is a symlink it's probably ill advised to change it since it would be system-wide.


There are env variables as others have mentioned, and additionally in NetBSD it's configurable in mk.conf[0]. I know, Net is not Open, but they look remarkably similar in this toolchain regard for system/package builds. Can't speak to Free or Dragon Fly, though. I wouldn't be surprised if they've diverged.

[0] https://wiki.netbsd.org/tutorials/pkgsrc/clang/


Where does CLANG vs GCC debate stand in 2017

I know CLANG is used in many research projects (which are often committed to the branch).

And i assume many, many legacy projects stick with GCC.


Worth keeping in mind they didn't do this because they think `clang` is better then the current `gcc` as far as compiling goes, they did it because they don't want to use `gcc` past version 4.3.0 since versions 4.3.0 and up are GPLv3. So this is more a comparison of the current `clang` vs. old `gcc` - obviously not much of a fair comparison.


People choose CLANG mainly because of its license. But GCC still produces faster binaries.


(what's with the CAPSLOCK, it's lowercase clang)

Such blanket statements are not useful. Actually different compilers win on different benchmarks.

Look at any test e.g. https://www.phoronix.com/scan.php?page=article&item=gcc8-cla... — GCC is faster for FFTs, clang is faster for matrix factorization, … they trade blows basically.


Not really, this depends on the code being compiled/run, e.g.: http://www.phoronix.com/scan.php?page=article&item=gcc7-clan...

GCC however has many more platforms supported, so in certain use cases LLVM is out of the question entirely. That being said: for OS development (and many other projects) predictable and correct is more important than fast. Optimizations that lead to these performance gains are oftentimes harmful given that code is typically far from perfect.


Apple Xcode has been using clang as their 'production' compiler for years now.


GCC is not for "legacy projects", it still generally generates slightly better code for general purpose programs on x86/AMD64. As Clang has been catching up to GCC in codegen quality, it has also been getting just as slow, so the old argument that Clang is faster is no longer valid. Both GCC and Clang approach their most intensive tasks in roughly the same way, and get similar performance. Clang tends to have more instrumentation tooling. GCC tends to have more interesting codegen tooling. GCC's profile-guided optimization and link-time optimization is more mature, and so is their OpenMP implementation. OpenMP code compiles better with GCC to this day.

LLVM itself, on the other hand, has many benefits. It has been around longer than GCC's standalone codegen library, and the API is quite good. There are more mature GPU ISA backends in LLVM as well, though GCC supports more exotic general purpose chips and microcontrollers. This works out better right now because companies have an easier time statically linking the (more permissively licensed) LLVM libs than the GCC ones, and so investing in these backends makes more sense for GPU vendors.


Any idea if they are switching to using lld as the system linker as well?


Based on this mailing list thread, I would assume so. https://marc.info/?l=openbsd-tech&m=148961125932497&w=2


Excellent! I have a soft spot for openbsd as a workstation os. Will give it a try again once this hits a release.


What's the easiest way to view the actual commit diff? I'm curious about what version of clang they're using, as the message indicates that they were using gcc4 before, which is fairly old.


OpenBSD has had a github mirror for awhile now. The commit is here: https://github.com/openbsd/src/commit/bc04e837fd81a3001f68f1...

As for clang version, it looks to be 4.0.0 per https://github.com/openbsd/src/commit/53d771aafdbe5b919f264f...


If you want to dig through history, FreshBSD may be of use: https://v4.freshbsd.org/project/openbsd/src?q=clang


Side-note: In the changed file, the first line is “$OpenBSD: bsd.own.mk,v 1.186 2017/07/26 19:44:42 robert Exp $”. Does anyone know if this line is updated by CVS or some other tool?


That's a CVS keyword, it's updated when the file is checked out or after it's committed. 1.186 is the most recent version of that file.


Awesome, thanks a bunch!


I imagine gcc4 was used due to it being the last gcc available before gcc moved to GPLv3. At least that's why FreeBSD stuck with it as the base compiler.


Good point! I hadn't thought of that. That being said, I thought that GPL was compatible with BSD; is that not the case for GPLv3, or was this more of an ideological choice by the OpenBSD developers?

Edit: Not sure what the reason for downvoting an honest question is; from Googling, I can't find anything about incompatibility between GPL and BSD, and I didn't express any opinion about the merits of BSD vs. GPL.


BSD is compatible with the GPL. You can use BSD licensed code in a GPL project. The 0verall project is GPL, any enhance,nets to the BSD code may be licensed under the GPL.

The flip is not true.

Using GPL code in a BSD licensed project may not remain a BSD licensed project. The derivative work of a BSD and GPL mixing must be GPL.


Through, any enhancement to GPL code that one author oneself can be independent licensed under any compatible license such as BSD. The only code that a BSD + GPL project that can't be used under BSD is the GPL code that other people wrote.

So to take an example, someone could write a GCC patch that is licensed under BSD, and then later use the same patch in clang.


I mean, in the case of a C compiler your code is mere data so license compatibility isn't really the concern.

BSD projects don't care for the anti-tivoization clause in the GPLv3 because it restricts where the software can be deployed, the GPLv2 was acceptable because there was no better alternative at the time and, again code was data in the case of a compiler.


"In the case of a C compiler your code is mere data "

A compiler tool chain typically _does_ insert code that you didn't write. For example, seen from the point of the loader, the entry point of an executable is not main. Similarly, one could argue that the function prologs and epilogs that gcc generates fall under the GPL.

Because of that, I think it is understandable that the BSD projects aren't willing to bet the existence of their projects "you code is mere data".


> Similarly, one could argue that the function prologs and epilogs that gcc generates fall under the GPL.

Can you really not compile code with gcc and have the result not be GPL licensed? This seems a bit dubious to me, although I honestly don't know enough to know if it's true or not


It is unlikely, and would be 100% unintended (https://www.gnu.org/licenses/gcc-exception-3.1-faq.en.html), but the GPL hasn't been battle-tested in court, and GPLv3 even less so.

Judging by their behaviors, it seems both Apple's lawyers and those of some of the BSDs seem wary to bet the firm on GPLv3. Quite possibly, that's just caution.



To save others who might read this thread some time, the specifics of this issue are described by this: "For historical reasons, the OpenBSD base system still includes the following GPL-licensed components: the GNU compiler collection (GCC) with supporting binutils and libraries, GNU CVS, GNU texinfo, the mkhybrid file system creation tool, and the readline library. Replacement by equivalent, more freely licensed tools is a long-term desideratum."


More precisely, they switched to GPLv3 in GCC 4.3.0.



They also appear to be using CVS, which is even older. :-)


OpenBSD's been at it for awhile..

Yes, CVS; .. on top of which they invented read-only anonymous source control: https://www.openbsd.org/papers/anoncvs-paper.pdf

Of course you're free to use the, far more recent, GitHub mirror: https://github.com/openbsd


Touche!


Good god. Seriously? There are so many much better options, and of course the obvious Git...

I remember working with CVS, it was horrendous; I'm surprised a project of this size is still on CVS.


It is one of the reasons, according to past contributors, why the project participation stagnates. Old devs have their flags and config all set up, and don't want to change to a new source control system just to get new kids on the block. A good example of how bds operating systems are being developed for the developers themselves, not for public use.


CVS works for them, so they keep using it. The devil you know, etc.


It's interesting to me to see this project be both so innovative in security domains, yet so backwards in usability.

I was trying to install openbsd on a new router this week, and I had a hell of a time trying to repartition the disk such that it didn't pre-allocate partitions for X11 (which I had no use for). I could delete partitions but then I got empty spaces and seemingly no good way to resize via the primitive CLI. I googled around and found this thread [1], where someone suggested a GUI installer to help novice users install it, which got this reply:

"Uh no. The OpenBSD installation routine is perfect as is. I would not want to see a GUI installer on my favourite BSD."

Really, perfect? Could not be possibly improved? And because you don't want a GUI, no one should have one? All the rest of the replies were similar to boot.

I went back to Linux as I can only imagine what the rest of running it is like. Sometimes old stubborn cranks can really get in the way of progress.

[1] http://daemonforums.org/showthread.php?t=8580


OpenBSD "partitioning" is actually a disklabel, and the TUI for it is not bad. I almost always tweak the default partitions a bit when I install, because I like a larger /usr.

You maybe missed option R:

   R [part]   Resize a partition in an automatically allocated label,
              compacting unused space between partitions with a higher
              offset.  The last partition will be shrunk if necessary.
              Works only for automatically allocated labels with no spoofed
              partitions.
In OpenBSD, for any question first consult the man pages before "googling around". Their man pages are really good.



I am a newer OpenBSD user, so I encounter roadblocks/papercuts quite frequently. What I like about the OS is that the problems are all easy in retrospect. I never struggle with the same issue twice. For instance, the first time I booted OpenBSD, I was met with the following prompt:

    boot>
What am I supposed to do here? Does it require some weird incantation in order to load the kernel? It turns out you just have to hit enter.

The OpenBSD install script really is quite simple. I'm not sure I would call it easy (simple ain't easy), but it's certainly straightforward and setup is usually just hitting enter a bunch of times. You only encountered trouble because you tried to fiddle with it and got frustrated.

Another huge benefit of OpenBSD compared to Linux is the incredible quality of documentation. Missing or outdated docs are considered a serious bug. To address your issue, let's look at the docs: https://man.openbsd.org/disklabel#AUTOMATIC_DISK_ALLOCATION

It looks as though if the installation target is smaller than 8GB, it won't create a separate partition for X11. That's useful to know. Additionally, you can specify your own partition template using the -T option.

It might be cool to see a graphical project like GhostBSD built on top of OpenBSD, but I have to agree with the curmudgeons that the text-based installer is pretty fantastic as-is.


> boot>

> What am I supposed to do here? Does it require some weird incantation in order to load the kernel? It turns out you just have to hit enter.

> [...]

> Another huge benefit of OpenBSD compared to Linux is the incredible quality of documentation.

Where is the documentation that explains a) why you have to click <Enter> to continue the boot process and, b) the reason why this isn't self-documenting at the "boot>" prompt?


I had not yet picked up the habit of reading the manual. https://man.openbsd.org/boot.8#EXAMPLES

Being able to reach the boot menu is critical when installing to a range of embedded devices such as routers. I use a null modem cable for input/output with my router, and the TTY settings need to be adjusted in order to see anything.

GUI bootloader alternatives often use the "press ESC really fast" strategy, which is arguably less intuitive if you're trying to figure out where the boot settings are.

Anyway the boot menu rather is self-documenting. It even responds to `help`. I eventually figured out the solution, but I still remember it as my first experience problem-solving on OpenBSD.


At least on my laptop, the grub menu has a list of options plus a sentence saying you can click the arrow keys to choose an item. It also says you can click "e" to edit one of the entries there. Once you've clicked a key there, the timer to automatically boot stops and you can take your time.

On my laptop the default configuration automatically boots too fast to read all of the sentence there. That's bad, but the tradeoff is I wait less time for my system to boot for all the times I don't need to reconfigure the boot.

That's what I mean by self-documenting, which the "boot>" example you gave is not. (Nor do I see any documentation about why it's not self-documenting.)


OT but why do you keep saying 'click' instead of 'press'?


Oh, good catch!

I don't think it's off topic-- for example, if the grub menu actually wrote "click" there would be a number of people who would grab the mouse and try to move a non-existent pointer to the letter "e".


The man page for boot(8) explains it.

You don't have to hit enter at the boot> prompt. It pauses for a few seconds to allow you a chance to enter boot commands; you can hit enter to boot immediately, or it will proceed with a default boot after a few seconds if you do nothing.


Ah, I think you're right. In my specific case the default "did nothing" because TTY settings were improperly configured. I actually needed the boot menu to be there.


What am I supposed to do here? Does it require some weird incantation in order to load the kernel? It turns out you just have to hit enter.

Did you really think every server running OpenBSD requires someone to hit enter on the console to boot it?


Apparently I was misremembering and it does continue booting after a few seconds without input. I didn't see any output because I needed to configure TTY settings.

Additionally, the manual tells you how to adjust the default boot settings.

After years of dealing with poor documentation, it takes some habit-building in order to check the openbsd manual every time you have a question. On Linux my first impulse is Google/StackOverflow.


How would a GUI install help you with your router install? I mean, it's not impossible to create one, but given that routers usually don't have video outputs, this is a very specific requirement. You won't even install X11 anyway. I'm not aware of any general-purpose OSes that have a GUI for router installs.

Also, maybe you want to review the "GUI is easier than TUI" premise. Clicking per se is not inherently easier than hitting ENTER. And while I'm aware that I'm biased in that regard since OpenBSD is my favourite Unix, I'm honestly not aware of any other OS that's easier to install, GUI or not. You really have to read, understand answer those questions, there's no way around it. At this point, the difference between typing yes/no and ENTER vs clicking is minimal, usability-wise. On the other hand, the additional complexity brought by a GUI is a big burden on developers.

I remember this same discussion years ago on Debian land, when they didn't have a GUI install. Like an email from a random guy stating that if Debian would only fix their install and have a GUI, they would conquer the world. At a certain point, they finally built a GUI install, that was exactly the same questions asked in a GUI that even resembled the TUI. Biggest difference was, you could use your mouse. And eventually have some trouble because the installer wouldn't recognize your video card. In which case you could always fallback to the TUI install, but the question is: were there any usability gains at all by switching to a GUI? My point being, you can improve usability regardless of GUI or TUI.

In OpenBSD, if you are installing several similar machines with the same setup you can also do unattended installs (https://man.openbsd.org/autoinstall), and that would really make the install easier and simpler. I don't think you were referring to this anyway; but other than that, and putting the fallacious GUI vs TUI aside, I'm not sure how else you could make OpenBSD install much easier. I would be really curious to listen if someone have some specific ideas , and I'm pretty sure OpenBSD devs would also be interested.


> Sometimes old stubborn cranks can really get in the way of progress.

I didn't see anyone getting in the way of progress. Someone said "I think the developers should make X for me," and other people replied "no they shouldn't." They could have said "yes, they should" and still nothing of substance would be done.


Randos on some forum don't speak for the project.


They speak for the attitudes of its community though.


SVN is a relatively painless migration, and is better simply because of a single changlist number to track like perforce.

Personally I advocate for Git, in leu of that, mercurial. CVS was so painful, it's just hard for me to imagine that people aren't actively figuring out a way off of it. It's literally one step above RCS...


When all your tooling is built around source repository X, and you understand how it all works, and you understand its quirks and how to avoid them, and you have demonstrated repeatedly the ability to hit release target dates consistently year after year, what then is the sound rationale to switch to source repository Y, and spend the considerable effort to change everything, just so you can keep doing what you are doing now?


I'm impressed that they are able to do that. I'm not going to enumerate all the advantages of Git like tools over CVS here.

I do know the pain of migration. I've done it at multiple companies, and am currently figuring it out for an extremely large codebase to get off of Perforce and onto Git.

There are significant productivity gains with Git across organizations that make it a worthwhile investment.


I personally prefer Fossil. It's a really nice little SCM with all sorts of neat features like built in wiki and great documentation.


It can't be good for attracting new contributors.


I don't see how this is a barrier for anyone who could contribute to OpenBSD (or any other OS). OS source code complexity isn't in maintaining branches, moving files and managing dozens of dependencies - it's about improving the code and not jinxing stuff in the process.


IIRC, Gentoo still uses CVS as well (which is arguably more fundamental to the project since it's source-based). The original creator of the OS now works on Funtoo, which is based on Gentoo but uses git.


Switched roughly two years ago: https://www.gentoo.org/news/2015/08/12/git-migration.html

It was an interesting migration project to follow seeing how the portage tree is huge.


> IIRC, Gentoo still uses CVS...

Somebody corrected this (Gentoo VC software) later, but NetBSD does also still use CVS. Repo conversions are indeed big, interesting projects. I've closely-witnessed/been-peripherally-involved in a few, and there were some interesting papers written about various conversions years ago[0], when both git and mercurial were new and iirc monotone still had interest.

[0] both Mozilla and Sun were publishing their analysis and growing pains


I don't think they like GPL, so the number of better options decreases rapidly.


Is there some reason this would come into play for the source management tool?


Ideally, the source management tool(s) would be included with the OS (so that the OS can self-host), which means that it's a factor for the same reason why OpenBSD has preferred non-copyleft licenses for (most of) the rest of the system.


This is awesome. It's going to save so many headaches building/porting C++ based code on OpenBSD. Hurrah.


Awesome? Maybe... https://www.bitrig.org/


Note, latest amd64 snapshots reflect the switch to clang if folks are curious and want to experiment:

  > cc -v
  OpenBSD clang version 4.0.0 (tags/RELEASE_400/final) (based on LLVM 4.0.0)
  Target: amd64-unknown-openbsd6.1
  Thread model: posix
  InstalledDir: /usr/bin


This is a big deal! Weren't the OpenBSD pretty clang-hostile a few years back? Maybe I'm mixing up them and the suckless people?

I think I remember that they were even trying to move away from gcc to something home-grown and simpler, since gcc is such a huge dependency and even uses c++ now.


> I think I remember that they were even trying to move away from gcc to something home-grown and simpler

You remember pcc, "Portable C Compiler", which, though currently actively developed [0], has a not-so-dependable history. 1.0 released in 2011, 1.1 in 2014, and no releases since.

pcc was merged in, but with the spotty history, taken out of base again.

> Weren't the OpenBSD pretty clang-hostile a few years back?

Not quite. LLVM was evaluated for the main compiler back in 2013 [1], but the main drawbacks was LLVM can't be used for every platform that OpenBSD supports. They also go on to talk about how every open compiler is painful, because there are no LTS releases. Compiler bugs were a major problem for OpenBSD, and I doubt its changed enough for that to not be the case.

[0] Mailing list is still going strong, and patches are still coming in and merging. (http://marc.info/?l=pcc-list)

[1] Long read, but in-depth: http://marc.info/?l=openbsd-misc&m=137530560232232


I think their biggest grief was the potential switch to the Apache license.


I thought they were fan of pcc. In any case, if they drop gcc, then please keep pcc active.


Pcc was added to the base system (not used for anything), but then pcc development stalled so pcc was removed from the base system.


It is unfortunate.


What's the meaning of "default compiler" in this context? Will the kernel be built with clang or gcc?


cc


Which was gcc, until today. Has that changed?


It's a symlink. And yes, there is no other meaning of "switching the default compiler." The effect is that any tooling that uses `cc` (like the `CC` implicit variable in Make) will now use `clang` rather than `gcc`. So any software built on the system uses clang unless you force it to use something else. No idea if they will still distribute gcc by default.


What I'm getting from your comment is that they're switching everything over to building with clang, including the kernel. That's interesting (it's a bold move) since there have been a few systems over the years that set aside a separate compiler just for building the kernel.


It would have been a bold move 4 years ago. Today most software that builds with gcc also builds with clang. FreeBSD's switch everything to clang project is dead because there is nothing left to do: nearly every open source project builds with clang by default, the rare exceptions are complex cases not worth fixing.

OpenBSD has spent the last few years ensuring their code all builds with clang, and they support. All the add on software you might want to use also works well built in clang.


The kernel has been buildable with clang for some time now.


What are some cool ways to introduce vulnerabilities into LLVM so that even "secure" languages like Rust are vulnerable to software-based exploits?


Become a regular contributor to LLVM for some time to build trust and then sneak in some complex optimization that under some rareish circumstances leads to exploitable binaries.


I'd imagine they're mostly the same kinds you could have in gcc


> I'd imagine they're mostly the same kinds you could have in gcc

Except that you'll have to provide your real identity for copyright assignments. It may be hard create fake identities unless you are related to some big 3 letter agencies.


How does FSF verify that the identity you claim is real?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: