> But such behaviour is still unacceptable from a library perspective: a library...

imtringued · 2025-03-08T08:11:53 1741421513

The cultural differences between C developers and non-C developers never ceases to amaze me.

The definition of unacceptable behaviour is so different. A program exit on an exploitable vulnerability is considered unacceptable. The program must continue running, even though all hope should have been lost by this point!

Meanwhile on the other side of the ocean, it would be unacceptable for a program to enable the complete take over of a system!

Getting the panics out of a rust codebase should much simpler than fuzzing out UB. After a few iterations, there won't be any panics whatsoever.

Honestly what I'm seeing here is essentially that C is being preferred, because it lets you sweep the problem under the rug, because UB is such a diffuse concept with unusual consequences. Whereas a panic is concrete and requires immediate attention, attention that will make the program better in the long run.

kazinator · 2025-03-09T07:31:45 1741505505

Many (most?) C developers are also non-C developers.

theamk · 2025-03-08T05:11:17 1741410677

This logic only works for memory-unsafe languages like C or C++, where the checks are rare and there is a good chance that by the time abnormal condition is detected, things have been going wrong for a while.

But that's not true for safe languages - in Rust, if you set a pointer to not-null, it won't end up as null (unless there is buggy unsafe, but let's ignore that). Instead, the panics are likely to be caused by logic errors. Take the decompressor buffer overflow errors author mentioned - the out-of-bounds writes were caused by a bug bit operations generating wrong array index. In Rust, this would be caught by bounds checker, which is good; but Rust would then abort process, which is bad. A hypothetical language that would throw an exception instead of panic'ing would be much better from library perspective - for example a web server might return 500 on that particular request, but it would stay running otherwise.

int_19h · 2025-03-08T22:32:23 1741473143

Panics can be caught and handled, though: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html. So it is exactly the right choice in the circumstances. Apps like webservers and other things that need to not terminate the process are expected to be compiled with panic=unwind. Everybody else should compile with panic=abort and sleep safe knowing that if their invariants get corrupted, execution will not continue.

Using Result<T, E> - the Rust's rough equivalent to checked exceptions in something like Java - would be the wrong choice here since it effectively forces the client to check and handle invariant violations inside library code - and the client really has no way to handle them other then do the equivalent of 500 Internal Error, so there's no point doing such checks on every call.

chrismorgan · 2025-03-08T06:49:10 1741416550

Just one problem with your argument: we’re not talking about languages that have orderly null pointer exceptions. We’re expressly talking about languages like C, C++ and Rust.

Your web server example is uncompelling, because a panic-based abort is not the only thing that can distress your system. The simplest example is if the library code doesn’t terminate, accidentally triggering an infinite loop. Or (better from some perspectives, worse from others) an infinite loop that allocates until you run out of memory, denying service until maybe an out-of-memory killer kills the process. In such scenarios, your system can easily end up in a state you didn’t write code expecting, where everything is just broken in mysterious ways.

No, if you want your web server to be able to return an orderly 500 in as many situations of unforeseen errors as possible, the plain truth is that you need the code that will produce that to run on a different computer (approximate definition, though with some designs it may not quite need to be separate hardware), and that you’ll need various sorts of supervisors to catch deviant states and (attempt to) restore order.

In short: for such an example, you already want tools that can abort a runaway process, so it’s actually not so abnormal for a library to be able to abort itself.

There’s genuinely a lot to be said for deliberately aborting the entire process early, and handling problems from outside the process. It’s not without its drawbacks, but it is compelling for things like web servers.

I would also note that, if you choose to, you can normally catch panics <https://doc.rust-lang.org/std/panic/fn.catch_unwind.html>, and Rust web servers tend to do this.

whytevuhuni · 2025-03-08T17:13:38 1741454018

> Your web server example is uncompelling, because a panic-based abort is not the only thing that can distress your system.

You seem to be saying that it wouldn't catch 100% of the problems, so catching only 80% is not that useful.

I see that as uncompelling. 80% helps a lot!

> you need the code that will produce that to run on a different computer

Problem is, we're using C, C++ and Rust, because latency and performance matters. Otherwise we'd be using Go or Java.

So in order to do what you're proposing, we'd have to do an outbound call on every link of a large filter/processing chain, serializing, transferring, and parsing the whole request data at each recoverable step.

chrismorgan · 2025-03-09T04:44:58 1741495498

Panics are already corner case territory.

What I’m describing about producing 500s from a different machine is standard practice at scale, part of load balancers. And at small scale, it’s still pretty standard practice to do that from at least a different process, part of reverse proxying.

vacuity · 2025-03-08T14:43:02 1741444982

I disagree. Just as C++ and Rust conceive of "plain old data" (POD) as data that basically has no surprising behaviors on cleanup (probably butchered that explanation), I think of libraries as "plain old code". Unlike services, libraries are part of the address space and should passively serve the main executable portion. That's why they are libraries specifically. A library should, if it detects an error, tell the main executable and then it is up to the main executable to do the responsible thing and address the error seriously. Needless to say, I don't like glibc-style "libraries". Doing too much in the background, in the same address space, no less. A library should be a nice, isolated unit that a programmer chooses to add into their executable at will. My concern is that composability and modularity aren't respected enough.

GuB-42 · 2025-03-08T15:37:57 1741448277

I consider a panic to be equivalent to a segfault. It is a bug, either in the library itself, or unproper usage, and the system just stopped you from doing further damage. The only reasonable thing to do in this case is to fix the bug.

Imagine you are catching the exception. What are you going to do next? From now on, anything can break, and how do you hope recovering if you couldn't do it right in the first place (that's what caused the panic)? You can have a panic handler, in the same way that you can trap SIGSEGV, for some debugging, but that's about it. If a crash is really problematic, use some process wrapper, and maybe a watchdog. Libraries don't work like that for performance and flexibility reasons, they share address space, but the downside is that if they crash, they take everything else with it.

vacuity · 2025-03-08T17:10:24 1741453824

Indeed. It's in the name: libraries are there to be browsed, while services wait for customers to call on them (and do management and other things in the meantime). This should be understood among the developers of applications, libraries, and services, that libaries are functional add-ons while services are separate agents. A library has its expertise and should isolate bubble up application errors (to be fixed by the application) and environmental errors (can't really be fixed by anyone, but the application should decide how to proceed). If a library has internal errors, how is the library itself or the application supposed to fix it? It's like position-independent code: if its basic assumptions are met, it should be plug-in-play anywhere without complaint or knowledge.

int_19h · 2025-03-08T22:36:02 1741473362

That is exactly what panics do in Rust. The difference between Result<T,E> and a panic is basically two-fold:

1. The owner of the process (whoever is compiling the binary) - not the library! - gets to decide whether panics immediately abort or try to unwind in a way that can be handled.

2. A panic is never expected (i.e. it always indicates a bug in the code rather than invalid input or other expected failure conditions), so it's optimized for the success scenario both in terms of syntax and in terms of runtime cost. In practice, it means that syntactically the panic always auto-propagates (whereas you need `try` etc with Result); and the code generated by compiler is zero-cost wrt panics if one never happens, but the unwinding is very slow if a panic does happen.

vacuity · 2025-03-08T22:56:04 1741474564

While panics can be made to unwind and be caught, at that point the different error handling methods are really abort/Result bubbling/Result bubbling but cooler, and I want the last one to be packaged up nicer than "unwinding panic" is today. Stack unwinding should be orthogonal. And I think Result bubbling the boilerplate way is good unless performance is paramount. Keeps the happy and error paths abundantly clear to the developer. I don't know the details of C++'s zero-cost exceptions, but Rust's certainly are not.

int_19h · 2025-03-08T23:08:50 1741475330

Can you clarify what you mean by "packaged up nicer"? Given that this is a feature that should be used very sparingly, and also one that is very apt to be misused (as experience with exceptions in other languages demonstrates), I would argue that lack of syntactic sugar for it is a good thing. But I'm willing to consider arguments to the contrary, yet the existing API seems broadly fine to me? How would you change it?

As far as cost, it depends on the arch and its ABI, but on x64 they use something called "unwind tables", which is basically a structure that lists all cleanup code that needs to be run for unwinding given a range of addresses inside a function. Such tables can be produced entirely at compile time, and they only need to be checked during unwinding (i.e. if there's a panic), so on the success path you pay no perf penalty. They are not entirely free in that they do make your binary larger, but speed shouldn't be affected.

vacuity · 2025-03-08T23:51:24 1741477884

> Given that this is a feature that should be used very sparingly, and also one that is very apt to be misused

While it should be used carefully, perhaps it need not be used sparingly. My main concern is that unwinding panic occupies a weird role of being both a way to crash and a catchable exception, and I think the former should be distinct and the latter should be integrated more nicely with normal Result bubbling, essentially doing the same thing but focusing on performance and readability on the happy path.

> As far as cost, it depends on the arch and its ABI

In Rust, panic branches are ubiquitous and the compiler's optimizations are hindered by the fact that mostly anything might panic. If there was an easy way to indicate that in this specific instantiation, integer overflow definitely won't happen, a panic could be avoided. In order to avoid unsafe, I imagine it would be something like contracts or mini theorem provers, though, which is only helpful if they're already being used.

throwaway17_17 · 2025-03-13T20:50:16 1741899016

I'm absolutely sure that this is the clearest statement of how I view libraries I've ever read. I am not actually certain I would have phrased it this well. So I'm commenting to save it in my comments thread.

pwdisswordfishz · 2025-03-08T06:21:33 1741414893

> the only sensible thing to do is abort the process

Abort whatever depends on the invariant, which may be less than the whole process.

int_19h · 2025-03-08T22:42:24 1741473744

Yeah, but you need to ensure that you have an actual isolation boundary in place between systems with different invariants to ensure that there's no shared state (since any shared state implies shared invariants). Which is exactly what processes are. Not necessarily OS processes, though - e.g. consider Erlang, where the notion of an ultra-lightweight process provided by the runtime instead is exactly what makes this kind of error handling a natural fit.

Now for some languages it is possible to determine that the state strictly through code analysis, without any runtime boundary enforcement. And I think that safe Rust might be in that category, but unsafe Rust definitely isn't - and whether any given library contains unsafe code is an implementation detail...

lelanthran · 2025-03-08T07:49:04 1741420144

> If an invariant (something we believe absolutely must be true) is violated, the only sensible thing to do is abort the process. For example, if you set a pointer to NULL and then half an hour later it's time to store a value in that pointer but somehow it's not NULL anymore: clearly something has gone terribly wrong, either in your logic, or potentially in totally unrelated code in some other library which has scribbled on the heap.

Then you return an error and let the caller deal with it.

There is no justification, ever, for a library aborting the caller. Even in the scenario you present, it's still better to let the caller deal with the fallout.

> If execution is allowed to continue, the resultant behaviour is totally undefined: you might corrupt data, or you might allow a permissions check to pass that should have failed, etc.

So? that corrupted data or failed permissions check already happened before the library gets to abort anyway.

Let the caller do whatever they can before the process aborts; don't abort for them before the caller has more context than the library does. If you abort without returning an error to the caller, the caller cannot do things like log "Hey, that previous permission check we allowed might have been allowed by accident".

Your way provides absolutely no upside at all.

burntsushi · 2025-03-08T13:33:53 1741440833

> Then you return an error and let the caller deal with it.

No, this is absurd. C libraries generally don't do this either. Instead, they might just have UB instead where as Rust tends to panic. So in practice, your suggestion ends up preferring vulnerabilities as a result of UB versus a DoS.

See: https://news.ycombinator.com/item?id=43300120

Can you show me C libraries you've written that are used by others that follow your strategy of turning broken runtime invariants into error values?

lelanthran · 2025-03-08T14:08:46 1741442926

> No, this is absurd.

Just to be clear, you are making the argument that when a library call detects an error of the form of unexpected NULL/not-NULL, that they abort immediately?

To be even more clear, I'm not making the argument that the program should proceed as normal after detecting an error (regardless of the type of error).

That is not the argument that I am making which is why I find your "That's absurd" condescension extremely confusing.

int_19h · 2025-03-08T22:46:24 1741473984

This is exactly what assert() does, so literally any C library that contains an assert() can produce a nonrecoverable error.

GP's point is that most libraries will not even assert(), but instead just assume the invariant holds and proceed accordingly, resulting in UB. And it is, of course, infeasible for a library to constantly test for invariants holding at every single point in the program. So in practice you have to assume that a library breaking its internal invariants is going to UB. If library dev added some asserts to this, they are doing you a favor by making sure that, at least for some particular subset of broken invariants, that UB is guaranteed to be a clean abort rather than running some code doing god knows what.

burntsushi · 2025-03-08T14:52:46 1741445566

My position is articulated here, with real examples: https://burntsushi.net/unwrap/

> Just to be clear, you are making the argument that when a library call detects an error of the form of unexpected NULL/not-NULL, that they abort immediately?

There's no blanket answer here because your scenario isn't specific enough. Is the pointer caller provided? Is the pointer entirely an internal detail whose invariant is managed internally? Is this pointer access important for perf? Is the pointer invariant encapsulated by something else?

Instead, I suggest showing examples of C libraries following your philosophy. Then we'll have something concrete.

In the comment I linked, you'll noticed that I actually looked at and reviewed real code examples. Maybe you could engage with those.

vacuity · 2025-03-08T21:52:25 1741470745

I happen to agree with @lelanthran's position, but aside from that I think their point is not that there are C libraries following their principle, but rather that libraries should follow it. This is akin to Rust's "get"/"try" style of fallible methods, avoiding both UB and exceptions/panics. It also seems moot to ask this of typical C libraries, as they wouldn't use exceptions either.

(I've edited this multiple times by now, apologies if it's confusing. Only adding things to it, but it may read weirdly as I've reconsidered what I'm trying to say.)

For something like array indexing in Rust, it's not bad to have a panicking operator by default because it's very upfront and largely desired. Similarly, a library may document when it panics just as it would document its error type if it returned error values. But something that I would consider very bad design is if I use a library that spawns another thread and does some file processing and can panic, without making this clear to me.

I think one of your main points is, suppose a library theoretically could index an array OOB and panic; it is not formally verified not to and so the developer is just covering all bases conveniently. The normal alternative being UB is of course unacceptable. There is a crucial distinction to be made here. If the index is derived from the application, return an error value making this clear to the application. However, at some point the index may be considered only internally relevant. I agree this is fine. The thought is that this will never trigger and the application will be none the wiser. If it is ever triggered, the library should be patched quickly. I think is not all the panics that people in this thread have in mind, as otherwise panics should be seen basically never, just as a well-designed but otherwise normal C program would have risk of UB but should exhibit this basically never. There should be an effort to minimize panics to the ones that are just sanity checks and only there for completeness, rather than a convenient way to handle failure.

With panics, either I just let them happen when they will or I have to defensively corral the library into working for me. With error values, the library has set out to state its terms and conditions and I can be happy that the burden is on me to use it properly. I have more control over the application's behavior from the start, and the extra work to surface errors to users properly is more or less equal between both approaches. Yes, panics can also be laid out in the API contract. But it's more enforceable with error values.

If there was a good way to do error-values-as-exceptions (automating Result bubbling with ?) that just panics up until a good boundary and returns a Result, that's basically catch_unwind but cleaner. It's true that oftentimes aborting (perhaps after cleanup) is the best way to handle errors, but it shouldn't be a struggle to avoid that when I know better. Particularly with C's malloc(): maybe I do want to change my program behavior upon failure instead of stopping right then and there.

vacuity · 2025-03-09T01:24:21 1741483461

I seem to be rambling. I will add, particularly to clarify my third paragraph (the big one): a library should not panic. It can have defensive panics, but overall it should not panic, and triggering a defensive panic is to be treated as a bug. The exception is the panics that the application developer can reasonably be considered to have agreed to, and presumably the library should take care to make those panics as easy to handle as possible.

burntsushi · 2025-03-09T12:30:51 1741523451

I think the knot you are trying to untangle here is what I untied here: https://burntsushi.net/unwrap/

The issue being addressed in this thread is that OP says this:

> While C++’s std::vector<T>::at() throws an exception which can then be caught and cleanly relayed to the application, a panic!() or an abort() are much more annoying to catch and handle. Moreover, panic!()’s are hiding even in the most innocious places like unwrap() and expect() calls, which in my perception should only be allowed in unsafe code, as they introduce a surface for a denial-of-service attack.

This is not a nuanced position separating internal runtime invariants with preconditions and what not, like what you're doing and like what my blog does. This is a blanket statement about the use of `unwrap()` itself, and presumably, all panicking branches.

This in turn led to this comment in this thread, to which I responded to as being terrible advice:

> That behavior is up to the user. The library should only report the error.

This is an extreme position and it seems to be advocated by several people in this thread. Yet nobody can point to real examples of this philosophy. Someone did point out sqlite, libavcodec, lmdb and zlib as having error codes that suggest this philosophy is employed, but actually looking at the code in question makes it clear that for most internal runtime invariants, the C library just gets UB. In contrast, Rust will usually prefer panics for the same sorts of broken invariants (like out-of-bounds access).

The bottom line here is that I perceive people are suggesting an inane philosophy to dealing with internal runtime invariants. And that instead of going back-and-forth in abstraction land trying to figure out what the fuck other people are talking about, I think it's far more efficient for people to provide concrete examples of real code used by real people that are following that philosophy.

If the philosophy being suggested has no real world examples and isn't even followed by the person suggesting it, then the certainty with which people seem to put this philosophy forward is completely unwarranted.

Asking for real world examples is a shortcut to cutting through all this confusing language for trying to describe the circumstances that can lead to aborts, panics or UB. (Hence why my blog I linked earlier in this comment is so long.) It's my way of trying to get to the heart of the matter and show that these pithy comments are probably not suggesting what you think they're suggesting.

I've written hundreds of thousands of lines of Rust over the years. Many of my libraries are used in production in a variety of places. All of those libraries use `unwrap()` and other panicking branches liberally. And this isn't just me. Other ecosystem libraries do the same thing, as does the standard library.

kazinator · 2025-03-09T07:35:07 1741505707

A Unix kernel will kill your process just because it wrote into a broken pipe.

anon-3988 · 2025-03-08T02:43:23 1741401803

That behavior is up to the user. The library should only report the error.

burntsushi · 2025-03-08T02:49:33 1741402173

This is terrible advice, and people suggesting it should put their money where their mouth it and show real world examples.

Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?

I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen."

Nevermind the fact that you're now potentially introducing an error channel into operations that should never actually error. That potentially has performance implications.

Nevermind the fact that now your implementation details (internal runtime invariants are implementation details) have now leaked out into your API. Need a new internal runtime invariant? Now you need to add that fact to the API. Need to remove an invariant? Ah well, you still need to leave that possible error value in your API to avoid breaking your users.

haberman · 2025-03-08T03:10:00 1741403400

Ok, but let's consider the other extreme: a world where no library can ever assume a runtime invariant holds without dynamically checking it first.

In this world, Vec::index() would need to perform not only a bounds check but also a check that the pointer is not NonNull::dangling(). Sure, RawVec is supposed to guarantee that the pointer will not be dangling when cap is >0, but RawVec could have a bug in it.

I agree that documenting and returning a PtrWasDanglingError error is not good API design. An InternalError for all such cases seems more reasonable. But at some point we need to be able to assume that certain program invariants hold without checking at all (in a release build).

burntsushi · 2025-03-08T03:21:17 1741404077

We don't have to live in the extreme though. That's one of the great advantages of Rust. :-)

In `regex`, for example, there are certainly some cases where I use `unsafe` to elide those dynamic checks because 1) I couldn't do it in safe code and 2) I got a performance bump from it. But of all the dynamic checks in `regex`, this was an extremely small subset of them.

And it makes sense to rely on abstractions like `RawVec` to uphold those guarantees.

The point is that you're making that trade-off intentionally and for a specific reason (perf). The idea that I would support dogmatically always checking every runtime invariant everywhere is bonkers. :P In contrast, we have someone here who I responded to that is literally suggesting propagating every possible broken runtime invariant into a public API error value.

haberman · 2025-03-08T03:34:58 1741404898

I am biased towards thinking that low-level libraries should generally avoid panic. That would mean that all invariants are either assumed true or returned as errors to the user.

I think this is not an unreasonable design: it's how low-level C libraries are traditionally designed. For example SQLite does what I mentioned and has a single SQLITE_INTERNAL error that is documented as:

> The SQLITE_INTERNAL result code indicates an internal malfunction. In a working version of SQLite, an application should never see this result code. If application does encounter this result code, it shows that there is a bug in the database engine. --https://www.sqlite.org/rescode.html#internal

I didn't mean to imply that you are for dogmatic checking of every runtime invariant, but the message that began that thread seems to advocate for that, going so far as to try to detect other buggy code that might have stomped on your memory.

burntsushi · 2025-03-08T03:49:31 1741405771

SQLite is really a terrible example of anything other than what you can accomplish when you pour enormous resources into a single C library. Its `SQLITE_INTERNAL` error code is atypical in my experience. My recollection is that its tests are an order of magnitude bigger than SQLite itself. It is nowhere near a typical example.

I don't think `SQLITE_INTERNAL` is how C libraries are typically designed, and even when they are, that doesn't mean they aren't risking UB in places. PCRE2 has its own `PCRE2_ERROR_INTERNAL` error value too, but it's had its fair share of UB related bugs because C is unsafe-everywhere-by-default.

More to the point, the fact that hitting UB-instead-of-abort-or-unwinding is normal C library design is kinda the point: that's almost certainly a good chunk of why you end up with CVEs worse than DoS. How many vulnerabilities would have been significantly limited if C made you opt into explicit bound check elision?

> but the message that began that thread seems to advocate for that

I agree it is poorly worded. I should have caught that in my initial comment in this thread.

The problem here really is the extremes IMO. The extremes are "libraries should never use `unwrap()`" and "libraries should check every runtime invariant at all points and panic when they break." You've gotta use your good judgment to pick and choose when they're appropriate.

But I have oodles of `unwrap()` in my Rust libraries. Including in the regex crate's parser. And for sure, some people have hit bugs that manifest as panics. And those could in turn feasibly be DoS problems. But they definitely weren't RCEs, and that's because I used `unwrap()`.

haberman · 2025-03-08T04:23:38 1741407818

> Its `SQLITE_INTERNAL` error code is atypical in my experience.

In my experience it's reasonably common. Here are some other examples in what I would consider quintessential, high-quality C libraries:

- zlib has Z_STREAM_ERROR, which is documented in several places as being returned "if the stream structure was inconsistent"

- libavcodec has AVERROR_BUG, documented as "Internal bug, also see AVERROR_BUG2".

- LMDB has MDB_PANIC, documented as "Update of meta page failed or environment had fatal error".

> And for sure, some people have hit bugs that manifest as panics. And those could in turn feasibly be DoS problems. But they definitely weren't RCEs, and that's because I used `unwrap()`.

I feel this is conflating two things: (1) whether or not an invariant should get a dynamic check, and (2) when a dynamic check is present, how the failure should be reported.

Rust brings safety by forcing (safe) code to use dynamic checks when a safety property cannot be statically guaranteed, which addresses (1). But there's still a degree of freedom for whether failures are reported as panics or as recoverable errors to the caller.

I wrote down some of my thinking in this recent blog entry, which actually quotes your excellent summary of when panics are appropriate: https://blog.reverberate.org/2025/02/03/no-panic-rust.html

(ps: I'm a daily rg user and fan of your work!)

burntsushi · 2025-03-08T04:56:24 1741409784

I'd have to look more closely at those examples, but I find it hard to believe that every runtime invariant violation manifests as one of those error codes. It certainly isn't true for PCRE2.

> Rust brings safety by forcing (safe) code to use dynamic checks when a safety property cannot be statically guaranteed, which addresses (1). But there's still a degree of freedom for whether failures are reported as panics or as recoverable errors to the caller.

Sure, you can propagate an error. I just don't really see a compelling reason to do so. Like, maybe there are niche scenarios where maybe it's worthwhile, but I do not see how it would be compelling to suggest it as general practice.

You might point to C libraries doing the same, but I'd have to investigate what exactly those error codes are actually being used for and _why_ the C library maintainers added them. And the trade-offs in C land are totally different than in Rust. Those error codes might not exist if they had a panicking mechanism available to them.

> I wrote down some of my thinking in this recent blog entry, which actually quotes your excellent summary of when panics are appropriate: https://blog.reverberate.org/2025/02/03/no-panic-rust.html

Yes, I've read that. It's a nice blog, but I don't think it's broadly applicable. Like, I don't see why I would write no-panic-Rust outside of extremely niche scenarios. My blog on unwraps is meant to be more broadly applicable: https://burntsushi.net/unwrap/ (It even covers this case of trying to turn runtime invariant violations into error codes.)

hyc_symas · 2025-03-14T13:12:44 1741957964

> LMDB has MDB_PANIC, documented as "Update of meta page failed or environment had fatal error".

Yes. That doesn't mean there was anything bad in the program logic. It most likely means your storage device had a fatal I/O error. It means there's something physically wrong with your system. Not that there was any bug in any code.

burntsushi · 2025-03-08T13:31:50 1741440710

Now that I've slept, I decided to take a look at LMDB. It uses MDB_PANIC in exactly two places:

https://github.com/LMDB/lmdb/blob/f20e41de09d97e4461946b7e26...

I would say this overall does not even come close to qualifying as an example of a library that "returns errors for invariant violations instead of committing UB."

You don't have to look far to see something that would normally be a panicking branch in Rust be a UB branch in C: https://github.com/LMDB/lmdb/blob/f20e41de09d97e4461946b7e26...

    if (err >= MDB_KEYEXIST && err <= MDB_LAST_ERRCODE) {
      i = err - MDB_KEYEXIST;
      return mdb_errstr[i];
    }

That `mdb_errstr[i]` will have UB if `i` is out of bounds. And `i` could be out of bounds if this code gets out of sync with the defined error constants and `mdb_errstr`. Moreover, it seems quite unlikely that this particular part of the code benefits perf-wise from omitting bounds checks. In other words, if this were Rust code and someone used `unsafe` to opt out of bounds checks here (assuming they weren't already elided automatically), that would be a gross error in judgment IMO.

The kind of examples I'm asking for would be C libraries that catch these sorts of runtime invariants and propagate them up as errors.

Instead, at least for LMDB, MDB_PANIC isn't really used for this purpose.

Now looking at zlib, from what I can tell, Z_STREAM_ERROR is used to validate input arguments. It's not actually being used to detect runtime invariants. zlib is just like most any other C library as far as I can tell. There are UB branches everywhere. I'm sure some of those are important for perf, but I've spent 10 years working on optimizing low level libraries in Rust, and I can say for certain that the vast majority of them are not.

libavcodec is more of the same. There are a ton of runtime invariants everywhere that are just UB if they are broken. Again, this is not an example of a library eagerly checking for invariant violations and percolating up errors. From what I can see, AVERROR_BUG is used at various boundaries to detect some kinds of inconsistencies in the data.

IMO, your examples are a total misrepresentation of how C libraries typically work. From my review, my prior was totally confirmed: C libraries will happily do UB when runtime invariants are broken, where as Rust code tends to panic. Rust code will opt into the "UB when runtime invariants are broken," but it is far far more limited.

And this further demonstrates why "unsafe by default" is so bad.

haberman · 2025-03-08T14:33:11 1741444391

I think this is moving the goalposts.

My claim was not "these C libraries perfectly avoid UB by dynamically checking every invariant that could lead to UB if broken." Clearly they do not, as you have demonstrated. (Neither does unsafe Rust).

My claim was that in cases where a (low-level, high quality) C library does check an invariant in a release build, it will generally report failure of that invariant as an explicit error code rather than by crashing the process.

To falsify that, you would need to find places where these libraries call abort() or exit() in response to an internal inconsistency, in a release build. I think you are unlikely to find examples of that in these libraries. (After a bit of searching, I see that libavcodec has a few abort()s, but uses AVERROR_BUG an order of magnitude more often).

I agree with you that Rust's "safe by default" is important. I am advocating that Rust can be a powerful tool to provide C-like behavior (no crash on inconsistency) with greater safety (checking all relevant inconsistencies by default). In cases where C-like behavior is desired, that's a really appealing proposition.

Upthread it seemed like you were objecting to the idea of ever reporting internal inconsistencies as recoverable errors. You argued that creating and documenting error codes for this is not common or practical:

> I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen."

That is exactly what SQLITE_INTERNAL and AVERROR_BUG are.

burntsushi · 2025-03-08T15:03:54 1741446234

> My claim was that in cases where a (low-level, high quality) C library does check an invariant in a release build, it will generally report failure of that invariant as an explicit error code rather than by crashing the process.

That just seems very uninteresting though? And it kinda misses the whole point of where this conversation started. It's true that Rust code is going to check more things because of `unwrap()`, but that's a good thing! Because the alternative is clearly what C libraries practice: they'll just have UB. So you give up the possibility of an RCE for the possibility of a DoS. Sounds like a good trade to me.

>> I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen." > > That is exactly what SQLITE_INTERNAL and AVERROR_BUG are.

I meant that it should reflect the philosophy of handling broken runtime invariants generally in the library. Just because there's one error code for some restricted subset of cases doesn't mean that's how they deal with broken runtime invariants. In all of your examples so far, the vast majority of broken runtime variants from what I can see lead to UB, not error codes.

This is what I meant because this is what makes Rust and its panicking materially different from C. And it's relevant especially in contexts where people say, "well just return an error instead of panicking." But C libraries generally don't do that either! They don't even bother checking most runtime invariants anyway, even when it doesn't matter for perf.

This is a big knot to untangle and I'm sure my wording could have been more precise. This is why I wanted to focus on examples, because we can look at real world things. And from my perspective, the examples you've given do not embody the original advice that I was replying to:

> That behavior is up to the user. The library should only report the error.

Instead, while there is limited support for "this error is a bug," the C libraries you've linked overwhelming prefer UB. That's the relevant point of comparison. I'm not interested in trying to find C libraries that abort. I'm interested in a holistic comparison of actual practice and using that to contextualize the blanket suggestions given in this thread.

haberman · 2025-03-08T15:31:03 1741447863

> It's true that Rust code is going to check more things because of `unwrap()`, but that's a good thing! Because the alternative is clearly what C libraries practice: they'll just have UB.

I have been consistently advocating for a third alternative that I happen to like more than either of these.

My alternative is: write libraries in No-Panic Rust. That means we have all of the safety, but none of the crashes. It is consistent with the position articulated upthread:

> That behavior is up to the user. The library should only report the error.

No-Panic Rust means always using "?" instead of unwrap(). This doesn't give up any safety! It just reports errors in a different way. Unfortunately it does mean eschewing the standard library, which isn't generally programmed like this.

I won't argue that every library should use this strategy. It is undoubtedly much more work. But in some cases, that extra work might be justified. Isn't it nice that this possibility exists?

burntsushi · 2025-03-08T16:19:34 1741450774

We're back to square one: show me some real Rust libraries in widespread use actually adhering to this philosophy. And then I want to see some applications built with this philosophy. Then we can look at what the actual user experience difference is when a bug occurs. In one case, you get a panic with a stack trace. In the other, you get an error value that the application does... what with? Prints it as an unactionable error to end users and aborts? If it continues on, does your library make any guarantees about the consistency of its internal state when a runtime invariant is broken?

Panicking branches are everywhere in Rust. And even in your blog, you needed to use `unsafe` to avoid some of them. So I don't really get why you claim it is safer.

Users of my libraries would 100% be super annoyed by this. Imagine if `Regex::find` returned a `Result` purely because a bug might happen.

> But in some cases, that extra work might be justified. Isn't it nice that this possibility exists?

What I said above:

> Sure, you can propagate an error. I just don't really see a compelling reason to do so. Like, maybe there are niche scenarios where maybe it's worthwhile, but I do not see how it would be compelling to suggest it as general practice.

Your blog is an interesting technical exercise, but you spend comparatively little time on whether doing it is actually worth the trouble. And there is effectively no space at all reserved to how this impacts library API design. To be fair, you do acknowledge this:

> I should be clear that I have not yet attempted this technique at scale, so I cannot report on how well it works in practice. For now it is an exciting future direction for upb, and one that I hope will pay off.

From your blog, you list 3 reasons to do this: binary size, unrecoverability and runtime overhead.

I find that binary size is the only legitimate reason here, and for saving 300 KB, I would absolutely call that very niche. And especially so given that you can make panics abort to remove the code size overhead.

I find unrecoverability unconvincing because we are talking about bugs here. Panics are just one very convenient manifestation of a bug. But lots of bugs are silent and just make the output incorrect in some way. I just don't see a problem at all with bugs, generally, causing an abort with a useful error message.

I find runtime overhead very unconvincing because you can opt out of them on a case-by-case basis when perf demands it.

We can go around the maypole all day on this. But I want to see real examples following your philosophy. Because then I can poke and prod at it and point to what I think you're missing. Is the `upd` port publicly available?

whytevuhuni · 2025-03-08T16:32:43 1741451563

I'd like to add another point:

Both panics, and error-values for invariants, add a lot of branches in execution, for every invariant that is checked, and every indirect caller of functions that do it.

This means basically all function calls introduce new control flow at the call site, because they may either panic, or return an error value that the programmer will almost always immediately bubble up.

Such a large amount of new control flow is going to be impossible to reason about.

But!

Panics, and specifically catching them, as they are implemented in Rust, require that the wrapped code is UnwindSafe [1]. This is a trait that is automatically implemented for objects that remain in a good state despite panics. This automatically makes sure that if something unexpected does happen, whatever state was being modified, either remains in a mostly safe shape, or becomes unreadable and needs to be nuked and rebuilt.

This is massively useful for things like webservers, because simply catching panics (or exhaustive error values) is not enough to recover from them. You need to be able to ensure that no state has been left permanently damaged by the panic, and Rust's implementation of catch_unwind requiring things to be UnwindSafe is a lot better than normal error values.

[1]: https://doc.rust-lang.org/stable/std/panic/trait.UnwindSafe....

haberman · 2025-03-09T20:54:32 1741553672

I do not claim that No-Panic Rust is popular (or even used at all) in Rust libraries currently. If it was popular, I would not have had to think so hard about it and write a blog entry. I claim that this technique is widespread in C libraries, and I believe I have demonstrated that.

Our conversation was sidetracked because you claimed that panic and unwrap() were essential parts of how Rust provides safety, and that the C precedent doesn't apply because C's approach is unsafe. But I claim that No-Panic Rust is potentially a solution that gives you the best of both worlds: comparable safety without risk of a (detected) bug crashing the entire process. So I do think that the C precedent applies.

I grant that there are applications where panics are a perfectly reasonable way of handling internal errors. Your ripgrep is a perfect example: it's a short-lived process that only does one thing, and users are running it from a terminal (and are probably tech savvy) so they can easily copy and paste the crash into a bug report.

But there are lots of other applications that are not like this. Consider the Linux kernel, where a panic takes down your entire computer. Or consider a mobile (iOS or Android) application where there is no terminal to dump to, and the user experience of a crash is that the app closes unexpectedly and without explanation. Or consider a web browser where it would be very annoying for an entire tab or browser to crash just because one operation (like using the search box) ran into an internal error.

In most of these cases, you want to let the program continue if reasonably possible after an error is encountered, while also logging the error for later inspection/diagnosis and possibly telemetry. Probably you will be abandoning any internal state associated with the failing operation.

It's true that my blog uses unsafe in two cases to get rid of panics. The first is to call libc::printf(), but this is only required because the Rust stdlib does not offer any No-Panic API for printing to stdout. This is really just a symptom of the fact that No-Panic programming has little precedent in Rust. If there was a No-Panic variant of the standard library, it could offer a safe API for printing to stdout.

The second case is an optimization, where we are trying to remove a bounds check for performance reasons. This is an example of "opting out on a case-by-case basis", except what I am proposing is more principled and arguably safer than merely switching to get_unchecked(). I am asserting the underlying invariant of the data structure, and then letting the optimizer infer that the invariant implies that the bounds check is not necessary. I think this is pretty interesting, and very cool that the compiler is able to do this.

So overall I do argue that No-Panic Rust offers comparable safety to panics and unwrap().

The Rust port of upb is on the back burner currently, and nothing is open-sourced yet.

burntsushi · 2025-03-09T21:25:52 1741555552

> I do not claim that No-Panic Rust is popular (or even used at all) in Rust libraries currently.

I didn't say you did! Goodness this conversation is super frustrating. I'm not trying to get you to legitimize your opinions by pointing to popularity, but I just want to see some examples of your philosophy actually working in real world scenarios.

> I claim that this technique is widespread in C libraries, and I believe I have demonstrated that.

I have yet to see any such evidence. The C libraries you've shown me have a litany of UB branches where Rust would have panicking branches. None of the C libraries you've linked are coded in the style demonstrated in your blog. If they were, there would be a whole lot more invariant checking (like bounds checks) leading to error codes for those invariant violations.

Instead, the C code primarily just lets UB take over for internal invariant violations. Which may indeed wind up in an abort. Or someone stealing your credit card numbers. ¯\_(ツ)_/¯ That's not at all the style you advocate for in your blog.

The C libraries you link do have some error codes for something resembling internal invariant violations, but from my review, this is not practiced generally and is far more limited than the style you advocate for in your blog.

> Your ripgrep is a perfect example

I specifically did not cite ripgrep as an example. I cited my libraries. I might be best known for my work on ripgrep, but the vast majority of Rust work I've done over the last decade is in libraries. And those are used in all sorts of places.

Moreover, it isn't just my libraries that use this philosophy. It's pretty much all of them, including std.

> Consider the Linux kernel, where a panic takes down your entire computer.

The Linux kernel is one of the few places where I've seen someone argue compellingly for "prefer UB on invariant violations generally, and not panicking." I don't agree with them, but I don't have any practical experience in that specific domain to refute them. Indeed, I view the practice quite skeptically, given that I'd greatly prefer my computer to shut down than, to, say, corrupt my data on disk.

> Or consider a mobile (iOS or Android) application where there is no terminal to dump to, and the user experience of a crash is that the app closes unexpectedly and without explanation. Or consider a web browser where it would be very annoying for an entire tab or browser to crash just because one operation (like using the search box) ran into an internal error. > > In most of these cases, you want to let the program continue if reasonably possible after an error is encountered, while also logging the error for later inspection/diagnosis and possibly telemetry. Probably you will be abandoning any internal state associated with the failing operation.

Your suggestion here amounts to asking Rust libraries to guarantee reasonable and consistent behavior when internal runtime invariants have been broken. That's the only way, "return an error for a broken invariant and otherwise continue on" actually works. I don't see how that's tractable and this is why I ask for examples.

There is nothing you can say that's going to convince me. I have to be shown. Because the fundamental component of my skepticism is seeing the practice in the real world and the kinds of effects it has that are not captured by either your analysis or mine. Indeed, my years of experience building fundamental ecosystem libraries in Rust tells me that your approach does not scale. At all.

I, several comments ago, carefully conceded that the style of Rust you advocate may be useful in niche scenarios. So my position is not, "your philosophy is never useful and it should never be used." My position is, "it is not good idea generally, and it does not match the prevailing convention of C libraries."

I think this conversation has probably run its course. Sincerely, I would like to see examples of your practice more broadly. I want to see how it works and what the real and actual trade-offs are.

erk__ · 2025-03-08T04:02:43 1741406563

If zstd give you an error and you don't handle it, the next calls may cause UB, so it kinda does both things.

https://github.com/facebook/zstd/blob/b16d193512d3ded82fd584...

chambers · 2025-03-08T20:00:41 1741464041

> SQLite is really a terrible example of anything other than what you can accomplish when you pour enormous resources into a single C library.

That's quite a sweeping, even caustic, indictment.

Can you explain this statement more?

burntsushi · 2025-03-09T12:40:18 1741524018

Another way of putting it is that SQLite is uniquely amazing.

But this makes it very atypical: https://www.sqlite.org/testing.html

So it is hard to use as an example of typical practice.

int_19h · 2025-03-08T22:48:39 1741474119

> That would mean that all invariants are either assumed true

What, exactly, is the benefit of assuming the invariant holds without checking it, over checking and aborting if it's not true? In the first case, you're likely to segfault anyway, just at some later point, making it harder to locate the point at which invariant was actually broken - and that's the best case. Worst case, you'll silently compute and return the wrong result based on garbage data.

anon-3988 · 2025-03-09T15:10:34 1741533034

> Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?

Its only "hard" because languages make it hard and implicit what the std library itself is doing.

At worst, every single "assert" should be an exception.

Not ALL possible invariant but wrong array index is a lot of them? If Rust's Vec::operator[] or allocation (new, default) all returns an Result<T, E>. Then its a matter of adding another invariant to the libraries' error enum.

I don't have anything public really to show this but at my work, our library written in C++ have like gazillions of checks using macros, every single invariant is checked and they all crash at debug and throw exceptions at release.

The idea is to eventually trim them down after its been stabilized? Just kidding! The reality is that it actually caught bugs introduced by changes multiple times so it will probably be there forever.

burntsushi · 2025-03-10T12:42:17 1741610537

Neither C nor Rust have exceptions. Rust has unwinding, which are kind of like exceptions. And indeed `assert!` will do unwinding (although applications can be built in a way where it will be an abort) that can be caught.

> Its only "hard" because languages make it hard and implicit what the std library itself is doing.

This is an extraordinary claim that requires extraordinary evidence. I absolutely don't buy this for one second.

> I don't have anything public really to show this but at my work, our library written in C++ have like gazillions of checks using macros, every single invariant is checked and they all crash at debug and throw exceptions at release.

This isn't good enough to demonstrate that your advice is broadly applicable. Nowhere near good enough. Other people in this thread provided examples of what they thought were the same thing, but it actually turned out that most broken internal runtime invariants would just lead to UB, despite having error codes like "this only gets returned if there is a bug." So I can't tell if you're making the same mischaracterization.

Moreover, you don't state the domain you're in. I could absolutely believe that there are some domains where it is acceptable to invest huge resources into eliminating all possible aborts, even for broken internal runtime invariants. But I'd expect them also to eliminate all possible instances of UB as well, because, well, UB can result in aborts! (And often does, via a segmentation fault.) At this point, you're in "prove your code is correct" territory. Sometimes that's warranted, but your comment was made without any of this nuance at all.

hansvm · 2025-03-08T06:23:40 1741415020

> Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?

We're talking about the cases that are already being caught somehow (bounds checks, unwraps, ...). It isn't necessary to detect all possible invariant violations to do something else instead of panic, and it suffices to have the language represent those failures without aborting the program.

burntsushi · 2025-03-08T12:34:10 1741437250

Show me a widely used C library that does even remotely the same thing. I promise you most places where Rust would use unwrap are just straight UB in C.

I note that you provided no real world examples despite my request for them. Where's your code that is following this advice of yours?

hansvm · 2025-03-08T15:10:11 1741446611

I'm not advocating for C or against Rust though. I'm saying that GP's request to report errors instead of crashing is a perfectly fine opinion, and using Rust as an example of a language which already traps most instances of C UB, there aren't any fundamental reasons why Rust (or a fork or a similar language) couldn't use a different mechanism to signal failure states. Your request for code is irrelevant to my point.

burntsushi · 2025-03-08T16:23:33 1741451013

You advocate a particular coding style and I ask for real world examples demonstrating your advocacy in the real world. That's absolutely relevant!

In contrast, the style I advocate has dozens of examples at your fingertips running in production right now. Including the Rust standard library itself. The Rust standard library happily uses `unwrap()` all over the place and specifically does not propagate errors that are purely the result of bugs coming from broken internal runtime invariants.

theamk · 2025-03-08T05:29:47 1741411787

That's how exceptions work, and they come pretty handy in a lot of circumstances. In such languages, any operation might throw RuntimeException (or equivalent) and the caller must be ready to handle that - or not, in which case it behaves exactly like non-exception-supported language.

I know that a lot of people hate that idea, but I strongly disagree. In any large programs, there are thousands of possible errors, and only a small part of them we actually want to handle in a special way. The rest? They go to "other" category. Being able to handle "other" errors, what Rust calls "panic", significantly improves user experience:

For CLI, print explanation that this is an unexpected failure, mention where the logs were saved, mention where to get support (forum/issue/etc...), and exit.

For cron-like scheduled service, notify oncall of the crash, re-schedule the job with intelligent timeout, then exit.

For web, upload details to observability platform, return 500 to user, then when possible terminate the worker.

and so on... In practical world, unexpected errors are a thing, and good language should support them to make programmers' lives easier.

One unfortunate downside of this ability is that some programmers abuse it, and ignore all the unknown errors instead of handling them properly - this makes a terrible user UX and introduces many bugs.

Also, for my "web services" example, if the worker is not terminated, there is a chance the internal data structures will get corrupted, and further requests, even the ones which used to pass, will now fail. There are ways to mitigate this - ignore some exception groups but unconditionally fail on others; or use try/finally blocks and immutable data to reduce the chance of corruption even in case of unexpected exception. But this code is hard to argue about and hard to test for.

Still, if a feature is not a good idea in some specific circumstances, it's not a reason to remove it altogether.

burntsushi · 2025-03-08T12:35:39 1741437339

I'm unsure of your point here. And I'm not getting dragged into a debate about exceptions. :-)

edflsafoiewq · 2025-03-08T16:39:28 1741451968

I think the point is very clear. Languages with exceptions report invariant violations like IndexOutOfBoundsError or AssertionFailed via the same error reporting mechanism as normal, unavoidable errors, namely by throwing exceptions.

burntsushi · 2025-03-09T12:43:30 1741524210

OK, sure. If there's a suggestion that this is better, then I wouldn't agree with that necessarily. But as I said, I don't want to get drawn into a more general discussion about exceptions. The nuance of just comparing Rust with C is barely possible to get across (see other discussion in this thread). Adding real exceptions into that mix is just a disaster lol.

int_19h · 2025-03-08T22:51:23 1741474283

Rust panics are pretty much exceptions under a different name. You can even "catch" the object passed to panic!().

The main difference is that with exceptions, they always unwind. With panics, the person building the binary can decide whether the panic should unwind or immediately abort.