Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Musl Preprocessor Debate (catfox.life)
116 points by pabs3 on April 18, 2022 | hide | past | favorite | 49 comments


The problem with library authors in general is that they, on average, have a better perspective how client programs should be written. Writing a good library is generally far harder than it is to use a good library, and musl is pretty decent so this remains true: most users of musl have far less technical skill and a more myopic view than Rich Felker does.

But, as is true of libraries, and programming languages, and life in general, there is always someone smarter than you are, who genuinely needs to do the thing you say people should generally not want to do. This is why taking a moral stance of "not" allowing the thing is almost always the wrong approach: what you want to do is make the non-recommended path unattractive to a casual newcomer, or have sufficiently good education and guardrails in place that people don't stumble upon it, but there has to be a way to do the non-recommended thing. If you don't, the person who really needs it will find a way to do it anyways, and probably curse your name the entire time.

Anyways, bringing this back to musl, I think it takes an agreeable stance that it follows POSIX (plus some stuff that it couldn't really leave out, because it breaks too many programs). Cool, it does a good job at that. But sometimes people, who are aware of the user-agent problem that musl is trying to avoid, would just like to make a special codepath that lets them use some more optimized, nonstandard thing that musl doesn't support. If you need your static initializers to get argv, then there isn't really much you can do but provide a macro for detecting musl. So provide that macro, name it THIS_IS_MUSL_BUT_MAKE_SURE_YOU_KNOW_WHAT_YOU_ARE_DOING or whatever, but make it available for things like these. Sometimes practicality has to win out over idealism, and musl contains a bunch of those tradeoffs already: this should be another one of them.


Agreed. Plus:

Sometimes practicality has to win out over idealism

We're engineering software, not writing constitutions. Practicality always comes first IMHO.


> Practicality always comes first

... famous first words of many people who engineer yet another piece of software which later needs to be rewritten, since they / their company could not be bothered to write a decent library. :-(


Pragmatism of decision is orthogonal to quality of execution :>


Not when the decision is to implement something of inherently low quality, but which seems locally expedient.


Yeah, you are probably right, but go to Rich and tell him about it, you will see what is the problem. Nothing short of Drepper. I wonder if all libc maintainers are like that.


>The problem with library authors in general is that they, on average, have a better perspective how client programs should be written.

Even if this may be true. Your users are an important stakeholder and their views are important to consider. An attitude that you always know better that your users is not a good one to have as users can have more experience acutally using the software than the developers themselves.


Isn't the entire point of the comment you replied to exactly that they don't always know better than your users even in the cases where you know best most of the time?


It's unclear to me his exact intent, so I'm either agreeing or slightly disagreeing with him.


> It’s a bug to assume a certain implementation has particular properties rather than testing.

Adding further weight to the article, I think this quote from Rich Felker is incredibly language-dependent, and categorically incorrect for languages with undefined behavior, such as C and C++. It's a bug to assume that properties of an implementation can be determined solely through testing.

For example, suppose you want to determine what happens when signed integers overflow. You can write a test program that checks the behavior of signed integer overflow in a variety of cases. That only shows that signed integer overflow has the property you want in the cases you've checked for. Even if you were to somehow check every single result of "a + b" for all 2^128 possible combinations of (a,b), that still would only tell you that it works in that particular program, at that particular execution time. It could be that you just haven't hit the particular set of register usage or of allowed optimizations to run into an edge case.

There is no possible test that can distinguish between an implementation that guarantees to have a particular property, and an implementation whose undefined behavior happens to match the property in the cases that have been tested. The only way to know is to have some additional piece of information, some additional communication channel to establish intent, rather than inferring intent from the observable behavior. In C, this is typically done with a preprocessor macro.


I think you are confusing undefined behaviour with implementation-defined behaviour.

The compiler is within its rights to optimise your program under the assumption that undefined behaviour is never invoked; this holds irrespective of the particular libc implementation you happen to be using.

You cannot assume that the specific libc implementation makes it "safe" to invoke UB in general.

As far as I'm aware, the specific example you give (signed integer overflow) has nothing to do with the libc implementation; it doesn't even have anything to do with the underlying architecture, since the compiler is still allowed to optimise under the assumption it never happens, even if it has well defined semantics on an x86.

On the other hand it's perfectly valid to rely on implementation-defined behaviour, assuming you can determine the implementation, but signed integer overflow doesn't fall into this category.


I'm reasonably sure signed integer overflow is implementation defined to mean twos complement overflow on x86.

Of course, assuming two's complement is non-portable, and the compiler is free to add a flag that breaks or enforces my assumption.

It's much better to add some casts (that the compiler will compile away on most architectures). However, that doesn't solve the problem of legacy code that assumes it is on an x86.


> I'm reasonably sure signed integer overflow is implementation defined to mean twos complement overflow on x86.

I think you're mistaken, or are using implementation-defined in a colloquial sense; the term implementation-defined has a specific technical meaning in the context of C that isn't the same as how it might be used in common parlance.

The distinction between implemention-defined behaviour and undefined behaviour is extremely important, because as soon as you invoke the latter your program "doesn't exist" and a standards-conforming compiler is allowed to do whatever it wants.

Signed integer overflow is explicitly given as an example of UB by the C standard [1], in 3.4.3:

> 3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.

By my reading, the specific part of the standard that makes signed overflow UB is in 6.5:

> 5 If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

This is completely unrelated to what actually happens when you issue an ADD instruction on an x86; the C standard doesn't care about this.

[1] : http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf


Implementation-defined means that an implementation must choose a behavior, document that behavior, and then consistently apply that behavior. Undefined behavior means that an implementation need not have consistent behavior, need not issue any warnings, need not be predictable, need not be repeatable. The classic example is that if you have signed integer overflow, the compiler is allowed to make demons fly out of your nose, and the C Standard would see nothing incorrect about that behavior.

It doesn't matter that the compiler's output is x86. You are entirely correct that x86 has signed integer overflow, but you aren't writing x86 assembly that runs on an x86 processor. You're writing C code, and that runs on the C Abstract Machine. The compiler's job is to output x86 code that will produce the same output as the C Abstract Machine outputs. If the C Abstract Machine doesn't define a behavior, then that means any x86 is acceptable by the Standard.


Do any C or C++ compilers behave like that in practice – have signed integer overflow which doesn't simply depend on data types and compilation options, but varies more dynamically, depending on the values, runtime execution mode, register allocations, optimisations, etc?

In practice, the space of actual undefined behaviours is a lot smaller than the space of legally permitted undefined behaviours – it may be legal per the standard for a compiler in a case of undefined behaviour to make demons fly out of your nose, but I don't think any compiler has ever actually done that, and it would be a mighty strange occurrence if any ever did. Less jocularly, even if some particular type of undefined behaviour is technically permitted by the standard, if all major compilers don't do it, it can be a reasonable engineering decision to assume none ever will, and deal with that unlikely risk if and when it actually occurs.


Here's one example: https://stackoverflow.com/questions/54510094/why-how-does-gc...

Worked as expected on X86, not as expected on ARM64. One of the answers shows the question author probably used a different GCC version, but it still feels like a good example.


   bool foo(int x) {
     return x+1 > x;
   }
This compiles to a function that always returns true on most compilers with optimizations turned on. If the value of `x+1` were merely implementation defined for `x == INT_MAX`, then it would still need to return false then, as it can't be larger than `INT_MAX`. Yet it returns true.

https://godbolt.org/z/hn3de1h49

edit: correction, clang and gcc optimize this to always true, MSVC doesn't.


That’s still “static” in the sense that it is only dependent on compilation options and data types, not the actual value of those data types at runtime. There is nothing there which could not trivially be determined by executing a test program-which is what the comment to which I was originally replying was talking about.

Imagine if there was a ‘set_thread_signed_overflow_behavior()’ API which dynamically set signed overflow behaviour at runtime. That would be an example of undefined behaviour which could not be easily detected by executing a test. But, although such an API is permitted by the standard, I’ve never heard of one actually being implemented-well, it exists for floating point (fesetexceptflag FE_OVERFLOW), but (to my knowledge) no mainstream platform has an equivalent for integers. It can be a reasonable engineering decision to ignore those (exceedingly?) rare platforms on which such an API exists for integers, and save thinking about them for the unlikely event you are asked to port to one of them - which will probably never happen, especially not any time soon.


Why is this insufficient for testing?

  $ strings /usr/local/musl/lib/libc.a | grep -i musl
  MUSL_LOCPATH
As an aside, I love Rich Felker's POSIX shell tricks.

http://www.etalabs.net/sh_tricks.html


Thanks for sharing this link. As soon as you start needing to read POSIX.* docs, you suddenly realize you need answers to a lot of basic questions.

Sufficiently advanced shell scripts should be moved to a programming language, but until you hit that threshold, whatever one defines as appropriate, you still need some of these answered.


If you need to coordinate activity between multiple systems in such a way that you are calling remote shells, then the shell is the place to run the show.

I wrote an article on xargs parallel scripting some years ago that was popular here, and I can't imagine trying to do that in C.

POSIX shells have great flaws. It is not LR-parsed language that can be implemented by a simple yacc grammar, and this leads to ambiguity.

However, POSIX (as dash) is fast, portable, and small enough for embedded systems. Few of dash's peers are its equal if all of these are required.


Totally. This is lost on a lot of developers. Even having a rudimentary knowledge of Shell Command Language saves a lot of otherwise wasted time.


The general argument isn't so much about "testing" vs "macros", because #ifdef LIBRARY_VERSION is also a test of a macro's value, but what type of thing should be tested. Whether to test the functionality or the labels of it, whether to test the things that the computer runs or something that is part of a comment, whether you test the signifier or the signified. In this context, testing functionality would mean compiling a test program and observing the results. A macro or the existence of MUSL_LOCPATH would both be testing the labels and not the actual code.


Because it may not test the features of the installed system. I may not have a / available for my target.


A middle ground could be more specific feature macros. For example instead of checking for __MUSL__ and assuming that it supports some function check __MUSL_SOME_FUNCTION or similar.

This is basically what your configure script ends up doing (likely poorly) anyways.


Why wouldn't you change the program being built so it doesn't rely on undefined behavior, then?


Writing C or C++ completely without undefined behavior is a daunting task. Signed integer over/underflow in particular is so easy to get wrong. Doing it manually is a recipe for disaster. In C++, one can leverage "safe" or "checked" integer types that look and behave like regular signed ints, except they ensure no undefined behavior by checking potentially undefined operations before hand and appropriately handling the case where the checks fail by, e.g. throwing an exception or aborting.

What makes this more insidious is that you can't just look at a function in isolation, you need to look at how it's invoked to consider cases for UB. Consider: `int add(int x, int y) { return x + y; }` Does this have UB? `add(1, 1)` is fine. But what about `add(1, INT_MAX)` or `add(-1, INT_MIN)`? Yup, these are both UB.

Yes, developers should strive to eliminate UB. But, it's a hard problem. Tooling helps, but unfortunately the default settings for every compiler I'm aware of don't help. One really needs to use external tools and sanitizers, but let's be honest, the ergonomics of having to know about and use a separate tool, such as ubsan, isn't great. The tools are wonderful and need wider use, but they'd be of so much more value if they were integrated more tightly in the compiler/linker toolchains and on by default.


The result of musl's attitude is build systems assuming "__linux__ && !__glibc__ means musl".

I pity the compatibility nightmares any new libc pushing for mass adoption will have to deal with. Musl's approach adds cost to everybody but themselves.


Do you have examples of this? What __linux__ && !__glibc__ should mean, in my opinion, Linux but without glibc extensions. This does not imply musl, but does happen to also be what musl aims for, so there is going to be users where either would look the same.


There are more libc's out there than just glibc and musl.

The `backtrace` call is a great example. It is available in glibc and uClibc, but not musl.

Instead of checking if using musl, and removing the backtrace call, the lack a musl define forces you to check for all the other possible libc's and assume musl is the default. This is dumb.


That is probably a bad example. That is a GNU extension, so if handling this via the preprocessor, the right guard should just be #ifdef __GLIBC__, I think. That should cover both glibc and glibc-compatible libcs such as uClibc. (Yes, uClibc does define __GLIBC__ as well.)


There's very few debates where I feel good about upstream moderating downstream's access to capabilities because they're afraid of uses.

Postgres notably refuses to implement query hints because people might do wrong => forks like OrioleDB emerge. EcmaScript promises refuse to allow synchronous access because ___ => we end up with endless userland dual sync/async constructs such as AbortSignal.

In general, I think we need more people to take inspiration from perl. "Perl, the first post-modern programming langauge"[1] is an old old old write-up, but I think it still holds very true that we generally ought enable. Even when we do set down guidelines, I feel like there should usually be ways to go outside the boundaries. To me, it's a huge feature & logical that there be capabilities in Java to peek into the private fields of a class- maybe not all code should have access to this, but this kind of deep access to a system is still, to me, highly desireable. It makes me sad to see languages like JS, once fore-runners in first-class everything, dynamic programming, accrue the trappings of a constrained, restricted, denying language: private fields (with no workarounds), annotations/decorators but load-time compiled away/invisible. Systems that expose what they are & let us monkey around are just better, cooler, have more potential.

Imagining nice pleasant happy well groomed langauges & systems is, to me, the ultimate premature optimziation: favoring some internal idea of a plan & binding everyone to it is a horrible mistake, one that often prevents criticism by simply denying the other party's their option, the chance to be explored.

[1] http://www.wall.org/~larry/pm.html


Old adage is "Make right thing simple, make complex thing possible."

Thing is, we're taking about C library here, a piece of code that is potentially being built for a system that is not running yet and has no emulator. The complex case is one of the defaults they must consider, but they made it impossible.

On the upside, that makes a strong case to just not use or support Musl. Glibc and uclibc exist.


Just as a counterexample, I'd like to mention Python. I think Python is very well known for matching your description:

> Systems that expose what they are & let us monkey around are just better, cooler, have more potential.

But Python really suffers performance-wise because of this. There are a lot of optimisations that could be applied to the interpreter if it weren't for the fact that <edge case> is allowed in Python.


Interesting re:OrioleDB, but the project seems to be targeting a number of additional major features vs. just query hints which would "in theory" seem to be do-able with relatively minor changes (eg. mostly parser tweaks). The project might win a few more stars if they mentioned query hints in the readme!


There's also an extension if you want just the hints: https://pghintplan.osdn.jp/pg_hint_plan.html


I was writing in perl in the 2000s and early 2010s.

I don’t want other languages to be more like perl.


> On my POWER9 Talos workstation, typical ./configure runs take longer than the builds themselves. This is because fork+exec is still a slow path on POWER. It is similar on ARM, MIPS, and many other RISC architectures.

Anyone know how this (the RISC-related slowness) can be true? Larger register file? Even that seems unlikely...


Page table and TLB handling comes to mind, although I'm not sure why ARM was included there since it does TLB automatically in hardware.


This is extra funny coming from musl which is expected to be used in cross compiling, embedded scenarios.

If you want feature tests, add more versioned defines not fewer in this case. You can actually version each public API if you want.

Plus, inevitably you will hit a case that cannot be detected using reasonable tests (e.g. non-flaky).


> If header-only libraries start requiring you to use build-time tests, you lose the main reason to use them in the first place.

Header-only libraries don't really work as advertised anyway. The need for various parts of a codebase (including copy-pasted third party headers) to coordinate on preprocessor settings is a good example of this.

Header only is great for casual research but pretty much immediately require extra work on the part of consumers.

The C community would do well to acknowledge that build and packaging system convergence is overdue. Messing around with header-only and preprocessor tricks is leaving the community in an awkward cul-de-sac.


> The C community would do well to acknowledge that build and packaging system convergence is overdue.

This would require an enormous amount of planning and forward thinking to pull off successfully.

I mean, hell, there isn't even a way to trust most libraries. So many of them are littered with type punning that e.g. the moment you move them off of x86 you're greeted with bus errors in stateless functions. Undefined behavior everywhere.

And then you have the "Oh, sizeof (long) will always be the same as sizeof (void *) and I can definitely do this cast!" libraries from 1999. Those are fun.

Even better are the "CHAR_BIT==8 everywhere in the world!" libraries that somehow forget to #error when that's not the case, so you have to manually review each and every assumption in each dependency before building for that fancy DSP.

It's a mess. Even with the recent popularity of ARM, many C programmers either forget platforms other than GNU/Linux on x86 exist or forget to document what they're targeting.


> And then you have the "Oh, sizeof (long) will always be the same as sizeof (void *) and I can definitely do this cast!" libraries from 1999. Those are fun.

Just out of curiosity-- where does this break down in common use? I know of some microcontrollers where e.g. long is 32 and pointers are 16... but most things really are ILP32 or LP64.

P.S. Long, long ago you wrote you found Mel Kaye, and I'm always curious about the actual story.


I just encountered this issue with Musl today, trying to make some software that uses backtrace() and backtrace_symbols() work with it.

I wound up making the use #if !defined(NO_BACKTRACE) in the code and passing CXXFLAGS=-DNO_BACKTRACE in the build script for Alpine Linux. (The symbol name is slightly genericized.)

In my case, whether or not Musl offered a macro, NO_BACKTRACE is a cleaner, more proper solution than adding to the #ifndef _WIN32 #elif defined(__MACH__) /* or __APPLE__? */ #elif defined(__glibc__) chain.


Why is bash getting easily confused about POSIX support? Aren't there POSIX macros?


There's a thread with Felker on the postfix-users mailing list whingeing that water is wet and Wietse or Viktor threatening to do conditional compilation if MUSL is present from April 2020, that sounds sort of like this:

Wietse: > Congratulations! You just gave a new definition of security theatre: > using an unauthenticated channel to distribute trust anchors.

Felker: It's not security theater because nobody's claiming it's secure. Rather it's a fairly weak form of hardening that increases the required capabilities an attacker needs to exploit a known-insecure system.

It's brain rot caused by too much systemd!


It occurs to me that someone may want to read the background material.

https://marc.info/?l=musl&m=158680293802966&w=2 I can haz DNS!

https://marc.info/?l=postfix-users&m=158698506811448&w=2 "toy" stub resolver

https://marc.info/?l=postfix-users&m=158986711614688&w=2 I'm really disappointed that you're detecting MUSL...

https://marc.info/?l=postfix-users&m=158716200010400&w=2 Harmful to limit the use of (broken) DANE lookups


Thanks for the downvotes. This guy literally advocates that we don't follow "secure", so fuck everything over.


What started the original thread was "postfix" "failing" security tests because it was installed in a Docker image running Alpine. Insecure! Why? OMG all these people, all this effort.

Felker?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: