Can anyone think of a use case for creating a second stack, without a thread, to...

geocar · on Jan 13, 2020

A forth interpreter is probably the simplest and most-common example of a two-stack system someone might be familiar with.

Sometimes I'd like to push an event onto a queue from an interrupt (e.g. a signal handler) and the buffer size can balloon some; sbrk() might be used because it's faster than malloc() and I know the consumer is fast enough (even though it's currently busy).

Sometimes I know the memory usage is very temporary, but I don't want to contort my program to use alloca() and so again: using brk can be convenient.

There are other things that are stack-like that we don't often think of as stacks.

One interesting place might be an undo-stack: An editor typically has an undo and redo-button and will never be freeing this state anyway, so rather than malloc/realloc my state (and have extra copies going on), and without trashing my heap with a long linked list, I might prefer the performance of sbrk(). You could possibly use alloca() as well, but this would make redo very difficult.

Another might be in a consensus algorithm: If implementing raft you wanted to queue messages until you had a chance for your election, your "stack" might take several seconds before you can start processing it. Keeping this on the "other" stack would make it difficult to do any other processing (where's the event loop?), and using malloc/realloc() will introduce copies. Reserving sbrk() for this stage might make everything much more straightforward.

jstimpfle · on Jan 13, 2020

> One interesting place might be an undo-stack: [..] so rather than malloc/realloc my state (and have extra copies going on), and without trashing my heap with a long linked list, I might prefer the performance of sbrk().

You can implement a stack-like thing that avoids copying, and even gives stable addresses, by linking fixed-size chunks (say 128 elements) in a list. A stack "pointer" is then a pointer to a chunk + offset.

That said, editor/undo redo items are not performance critical. And you will be hard pressed to find a reason why you need to optimize memory consumption. Assuming a memory overhead of 16 bytes per allocation and assuming that a human would make < 1M edits.

Preferring an obscure API like sbrk() instead is not good general advice. It's not portable and it precludes the use of basically any library including libc.

vardump · on Jan 13, 2020

> And you will be hard pressed to find a reason why you need to optimize memory consumption.

You might want to minimize fragmentation and number of underutilized heap pages.

Even if your allocator minimizes fragmentation by allocating similar size objects from same area, allocating a large number of objects risks some other code (possibly running in a different thread) allocating long-living objects in the same time.

This can cause pages to exist just to hold a single small object, preventing optimal use of memory.

In some algorithms, 90-99%+ of runtime can be in allocation and freeing small objects!

On top of that, accessing a sequential stack (array) is significantly faster. CPUs are built for that type of access patterns.

jstimpfle · on Jan 13, 2020

I had to read a little between the lines to understand your point, which I think is this: Even if you have an object type that requires only lowest-performance memory management (say, editor undo/redo items), you should separate these allocation concerns from all the other allocation concerns in the program, because you might risk contention. It's important to group allocations by lifetime.

To which I'll say, shouldn't we rather optimize and separate the high-bandwidth stuff, so we can keep using malloc-per-object for the unimportant stuff?

The rest of your comment does not apply to a simple undo/redo system:

> In some algorithms, 90-99%+ of runtime can be in allocation and freeing small objects!

> On top of that, accessing a sequential stack (array) is significantly faster. CPUs are built for that type of access patterns.

geocar · on Jan 13, 2020

Not the person you’re replying to, but:

> shouldn't we rather optimize and separate the high-bandwidth stuff, so we can keep using malloc-per-object for the unimportant stuff?

If it helps, sure, but one problem that shows up nasty at high bandwidth is variable latency: malloc might be too clever and vary on the order of msec and with hard-real-time environments you can’t amortise the cost over multiple messages since missing one means you never catch back up (without e.g. throwing data away)

ShroudedNight · on Jan 13, 2020

I've only cursory exposure to it, but my understanding is that TLSF malloc[1] is designed to directly address those concerns. Is it known to be deficient such that its claims are invalid in / worthless for real-world use?

[1] http://www.gii.upv.es/tlsf/files/ecrts04_tlsf.pdf

heavenlyhash · on Jan 13, 2020

It's useful all the time in serialization and marshalling systems. You end up with very nice code if you can keep a separate stack for each of e.g. your json parser and its paired object unmarshaller.

The nicest outcomes are with coroutines, though, and that begins to be another matter.