If you twist your perspective a little, you can change how you view the stack; instead of looking at it as a literally linear blob of data, consider it the 'data' or 'scope' pointer of a closure (frequently implemented as a pair of data and code pointers). A key element of this closure is the continuation closure, which is implicitly passed in to closures when they get called; the 'RET' type of CPU opcode is the normal way of calling this closure.
If one carefully adds stack splitting and duplication capabilities to an ordinary procedural language, it amounts to a conversion to continuation-passing style. Not only does that cut down on some of the drawbacks of massive multithreading on 32-bit (address space exhaustion, though not the expense of context switches), but it also permits interesting implementation strategies (e.g. making call/cc work in C or languages implemented in terms of C, which can in turn be applied to implementation of stateless servers).
If one carefully adds stack splitting and duplication capabilities to an ordinary procedural language, it amounts to a conversion to continuation-passing style. Not only does that cut down on some of the drawbacks of massive multithreading on 32-bit (address space exhaustion, though not the expense of context switches), but it also permits interesting implementation strategies (e.g. making call/cc work in C or languages implemented in terms of C, which can in turn be applied to implementation of stateless servers).