That's definitely part of it- they are heavyweight threads or processes (no matter how you configure it) (rather than lightweight erlang threads). I would imagine the bigger problem here, however, is context switching. Because Apache (and most other servers) use OS threading or forking, the entire state of the current process (let's say) has to be put aside and then restored later, maybe after only a few milliseconds- because the OS isn't going to presume anything about the state of that process. In Erlang's concurrency model it simply runs 20 (or so, I think- anyway some static number) functions and then goes to another process.
Because the functions are, uh, functional there is almost no state to save and restore when there is a context switch. Also because things like iteration are done with recursion- the 20 functions and then context switch doesn't get hung up on some loop somewhere. The faster the OS tries to keep switching between processes, the more overhead it introduces.
BTW, this is why in my opinion you don't see a lot of other systems duplicating this sort of concurrency model- because there are some language "limitations" (recursion only, for example) that make it possible.
Most apache, erlang, or kernel hackers would know way more about this than I do, of course.
In a good OS (I believe Linux qualifies), you only need to save the contents of the registers. Each process has its own address space, each thread has its own stack, and so there's no need to much around with memory beyond that. Stacks aren't actually copied around, the processor simply restores %esp and %ebp from the saved process data structure.
IIR my OS design course correctly, the big performance hit is the switch from user mode to kernel mode. I'm not sure *why* that's a big hit, but it seems to be a slow operation on most processors.
You can use user-mode threading libraries in C/C++, but Apache doesn't. Perhaps that's why it's slow. (The main reason it doesn't is probably that user-mode threading blocks the whole process when one thread performs IO, which obviously doesn't work well in an I/O bound application like a webserver.)
There are other C/C++ webservers - like Lighttpd - that use poll/epoll for I/O. These should run even faster than YAWS/Erlang; anyone have any benchmarks to compare them?
Every time I see this graph I am a little bit tempted to puke.
There is only the barest explanation of what the test means, and as far as I can understand the circumstances of the test are not like those typically encountered by web servers, even those under extremely high load.
On top of that, this graph measures performance, not scalability. So it's saying, if you really want to perform, where by "perform" we mean have 4000 HTTP server sockets open concurrently, use YAWS not Apache.
If you can think of a methodology that won't make you puke, please suggest it. Yaws is easy to install on most platforms (like "apt-get install yaws") and very easy to configure. Maybe in the academic sense you can poke holes in the methodology, but in the practical world I can't think of a more clever way to test real load. When it boils down to it Yaws would be much more likely survive a digg effect or DoS attack, period.
I do. Erlang is a blast. It's a language/platform specifically developed to be very scalable and very fault tolerant- because it was designed for telecommunications systems. Here's a nice write up from some of the guys at Sendmail: http://www.jetcafe.org/~npc/doc/euc00-sendmail.html
It's not a particularly popular language (I guess no functional language is)- and has only started gaining traction the last few years. Which means nothing of course. It's a bit of an oversimplification, but to me Erlang is to pi-calculus what Lisp is to lambda-calculus.
I wonder if old pre-forked process Apache would have done better, since it's not so easy to trick it into consuming infinite memory.