1. Easy on a 64-bit system. Just allocate dozens of megabytes of virtual address space with the memory being not readable nor writable. Mark a small proportion as readable and writable. In response to segfaults, grow the stack by marking even more memory as readable and writable. My toy implementation does this. It's very easy to do in about 20 lines of code.
2. You will need to reimplement thread local storage yourself. A production-quality implementation should handle this for you. Thread local storage just means having a special section of memory as a read-only template, and then duplicate them into fresh pages upon the creation of the fiber, mark them as writable, call constructors. And then remember to save the segment registers when switching. Tedious but doable.
3. You need your own locks. This is apparent even in high-level languages like Python. Notice how Python asyncio provides its own mutexes and semaphores? A production quality implementation will handle this for you, but you'll need to use them instead of the OS-provided ones.
2. You will need to reimplement thread local storage yourself. A production-quality implementation should handle this for you. Thread local storage just means having a special section of memory as a read-only template, and then duplicate them into fresh pages upon the creation of the fiber, mark them as writable, call constructors. And then remember to save the segment registers when switching. Tedious but doable.
3. You need your own locks. This is apparent even in high-level languages like Python. Notice how Python asyncio provides its own mutexes and semaphores? A production quality implementation will handle this for you, but you'll need to use them instead of the OS-provided ones.