I guess multiprocessing got a bad reputation because it used to be slow and simple so it got looked down upon as a primitive tool for less capable developers.
But the world has changed. Modern systems are excellent for multiprocessing, CPUs are fast, cores are plentiful and memory bandwidth just continues getting better and better. Single thread performance has stalled.
It really is time to reconsider the old mantras. Setting up highly complicated containerized environments to manage a fleet of anemic VMs because NodeJS' single threaded event loop chokes on real traffic is not the future.
That really has nothing to do with the choice to use CGI. You can just as well use rust with Axum or Actix and get a fully threaded web server without having to fork for every request.
Absolutely, I'm not recommending for everybody to go back using CGI (the protocol). I was responding to this:
> The CGI model may still work fine, but it is an outdated execution model
The CGI model of one process per request is excellent for modern hardware and really should not be scoffed at anymore IMO.
It can both utilize big machines, scale to zero, is almost leak-proof as the OS cleans up all used memory and file descriptors, is language-independent, dead simple to understand, allows for finer granularity resource control (max mem, file descriptor count, chroot) than threads, ...
Typically I've run cgi from a directory outside the document root. That's easy, and I think was the defaults?
That said, fork+exec isn't the best for throughput. Especially if the httpd doesn't isolate forking into a separate, barebones, child process, fork+exec involves a lot of kernel work.
FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.
> FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.
What's the point of using FastCGI compared to a plain http server then? If you are going to have a persistent server running why not just use the protocol you are already using the semantics of?
I don't generally want or need my application server to serve static files, but I may want to serve them on the same hostname (or maybe I don't).
There's potential benefits for the httpd to manage specifics of client connections as well: If I'm using a single threaded process per request execution model, keep-alive connections really ruin that. Similarly with client transfer-encoding requests, does my application server need to know about that. Does my application server need to understand http/2 or http/3?
You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server... although then you miss out on some speciality things like X-Sendfile to accelerate sending of files from the application server without actually transferring them through sockets to the httpd. You could add that to an http proxy too, I suppose.
> You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server
That's what I meant. Things like X-Sendfile (or X-Accel-Redirect in nginx) works with http backends. Why involve a different protocol to transfer a HTTP request to a backend instead of... HTTP? I really don't get the point of FastCGI over plain HTTP when a reverse proxy is talking to a upstream backend server.
I mean, that protocol doesn't really matter to me, that's why I said "FastCGI or some other method" The important bit is avoiding fork+exec on every request.
FastCGI is binary based, which has benefits, but hopefully a reverse proxy sends well-formed HTTP requests ... but maybe having the application runtime provide an http frontend encourages running the application software directly accessible, which isn't always wise... some of them are really bad at HTTP.
Yep, that is definitely problematic. But it also allowed a sprawling ecosystem of tons of small applications that people could just download and put on their website via FTP and do the configuration in the browser afterwards.
This is easy enough for non-technical people or school kids and still how it works for many Wordpress sites.
The modern way of deploying things is safer but the extra complexity has pushed many, many folks to just put their stuff on Facebook/Instagram instead of leveling up their devops skills.
Somehow we need to get the simplicity back, I think. Preferably without all the exploits.
What kind of problems? Like, if the administrator put something inside that directory (Unix doesn't have folders) that the web server shouldn't execute? That kind of problems? I've literally never had that problem in my life and I've had web pages for 30 years.
> Like, if the administrator put something inside that directory
Path traversal bugs allowing written files to land in the cgi-bin used to be a huge exploit vector. Interestingly, some software actually relied on being able to write executable files into the document root, so the simple answer of making the permissions more limited is actually not a silver bullet.
If you've never seen or heard of this, ¯\_(ツ)_/¯
> Unix doesn't have folders
Great and very important point. Someone should go fix all of these bugs:
I've certainly heard of that problem, but I've never experienced it, because it's easy to avoid. At least, it's easy if you're not running certain pieces of software. I'd suggest not using Wordpress (or, ideally, PHP) and disabling ExecCGI in whatever directories you need to host untrusted executables in.
Of course, disabling ExecCGI in one directory won't help if you do have path traversal holes in your upload-handling code.
I'm not convinced that disabling CGI will help if attackers can use a path traversal hole to upload malicious executables to arbitrary paths you can write to. They can overwrite your .bashrc or your FastCGI backend program or whatever you're likely to execute. CGI seems like the wrong thing to blame for that.
Why are you linking me to a "Sign in to search code on GitHub" page?
> Why are you linking me to a "Sign in to search code on GitHub" page?
GitHub is basically the only service I'm aware of that actually has the ability to grep over the Linux kernel. Most of the other "code search" systems either cost money to use or only search specific symbols (e.g. the one hosted on free-electrons.)
For a similar effect, grep the Linux kernel and be amazed as the term "folder" is actually used quite a lot to mean "directory" because the distinction doesn't matter anymore (and because when you're implementing filesystem drivers you have to contend with the fact that some of them do have "folders".)
I feel it necessary to clarify that I am not suggesting we should use single-threaded servers. My go-to approach for one-offs is Go HTTP servers and reverse proxying. This will do quite well to utilize multiple CPU cores, although admittedly Go is still far from optimal.
Still, even when people run single-thread event loop servers, you can run an instance per CPU core; I recall this being common for WSGI/Python.
But the world has changed. Modern systems are excellent for multiprocessing, CPUs are fast, cores are plentiful and memory bandwidth just continues getting better and better. Single thread performance has stalled.
It really is time to reconsider the old mantras. Setting up highly complicated containerized environments to manage a fleet of anemic VMs because NodeJS' single threaded event loop chokes on real traffic is not the future.