> Think about all the problems associated with process life cycle - is a process stalled? Should I restart it? Why is that process using so much memory? How should my process count change with demand? All of those go away when the lifecycle is tied to the request.
So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright. I mean look, everyone makes mistakes, but if you are routinely running into problems shipping basic FastCGI or HTTP servers in the modern era you really need to introspect what's going wrong. I am no stranger to writing one-off Go servers for things and this is not a serious concern.
Plus, realistically, this only gives a little bit of insulation anyway. You can definitely still write CGI scripts that explode violently if you want to. The only way you can really prevent that is by having complete isolation between processes, which is not something you traditionally do with CGI.
> It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.
What information does this leak, and why should I be concerned?
> Or you know not every site is about scaling requests. It’s another way you can simplify.
> > but it is an outdated execution model
> Not an argument.
Correct. That's not the argument, it's the conclusion.
For some reason you ignored the imperative parts,
> It's cool that you can fork+exec 5000 times per second, but if you don't have to, isn't that significantly better?
> Plus, with FastCGI, it's trivial to have separate privileges for the application server and the webserver.
> [Having] the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.
Those are the primary reasons why I believe the CGI model of execution is outdated.
> The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.
CGI is in the opposite direction, though. With CGI, the default behavior is that your CGI process is going to run with similar privileges to the web server itself, under the same user. On a modern Linux server it's relatively easy to set up a separate user with more specifically-tuned privileges and with various isolation options and resource limits (e.g. cgroups.)
I'd push back on some of this. Specifically, the memory management that is somewhat inherent to how a CGI script works is typically easier to manage than longer life cycle things. You just tear down the entire process; instead of having to carefully tear down each thing created during the process.
Sure, it is easy to view this as the process being somewhat sloppy with regards to how it did memory. But it can also be seen as just less work. If you can toss the entire allocated range of memory, what benefit is there to carefully walking back each allocated structure? (Notably, arenas and such are efforts to get this kind of behavior in longer lived processes.)
True, it is simpler to just simply never free memory and let process teardown take care of it, but I'm only disagreeing with the notion that it's non-trivial to write servers that simply don't leak memory per-request. I think with modern tools, it's pretty easy for anyone to accomplish. Hell, if you can just slap Boehm GC into your C program, maybe it's trivial to accomplish with old tools, too.
Fair. My push was less on just not leaking memory entirely, and more that it can scale faster. Both using a GC and relying on teardown are essentially punting the problem from the specific request handling code onto something else. It was not uncommon to see GC based systems fall behind under load. Specifically because their task was more work than tearing down a process.
> So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright
Yes. The code is already shitty. That’s life. Let’s make the system more reliable and fault tolerant.
This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.
But let me add another reason with your framing. In fire/forget programmers get used to crashing intentionally at the first sign of trouble. This makes it easy to detect failures and improve code. The incentive for long running processes is to avoid crashing, so programs get into bad states instead.
> The only way you can really prevent that is by having complete isolation between processes
Yes. That’s the idea. Separate memory spaces.
> What information does this leak
Anything that might be in a resource, or memory. Or even in the resource of a library.
> and why should I be concerned
Accessing leaked information form a prior run is a common attack.
> but if you don't have to, isn't that significantly better?
Long running processes are inherently more complex. The only benefit is performance.
> H’the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.
As opposed to? All processes have a working directory. What problems come from using the file system?
> Yes. The code is already shitty. That’s life. Let’s make the system more reliable so that small code mistakes are disasters.
> This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.
This is not a "Simply don't make mistakes" type of argument, it's more like a "We've moved past this problem" type of argument. The choice of garbage collection as an example is a little funny, because actually I'd argue heavily in favor of using garbage collection if you're not latency-sensitive; after all, like I said, I use Go for a lot of one-off servers.
It'd be one thing if every person had to sit there and solve again the basic problems behind writing an HTTP server, but you don't anymore. Many modern platforms put a perfectly stable HTTP server right in the standard library, freeing you from even needing to install more dependencies to be able to handle HTTP requests effectively.
> > The only way you can really prevent that is by having complete isolation between processes
> Yes. That’s the idea. Web server forks, and execs. Separate memory spaces.
That's not complete isolation between processes. You can still starve the CPU or RAM, get into contention over global locks (e.g. sqlite database), do conflicting file I/O inside the same namespace. I can go on but the point is that I don't consider two processes running on the same machine to be "isolated" with each-other. ("Process isolation" is typically used to talk about isolation between processes, not isolation of workloads into processes.) If you do it badly, you can wind up with requests that sporadically fail or hang. If you do it worse, you can wind up with corruption/interleaving writes/etc.
Meanwhile, if you're running a typical Linux distro with systemd, you can slap cgroups and namespacing onto your service with the triviality of slapping some options into an INI file. (And if you're not because you hate systemd, well, all of the features are still there, you just may need to do more work to use them.)
> > What information does this leak
> Anything that might be in a resource, or memory. Or even in the resource of a library you use.
> > and why should I be concerned
> Accessing leaked information form a prior run is a common attack.
I will grant you that you can't help it if one of your dependencies (or God help you, the standard library/runtime of your programming language) is buggy and leaks global state between instantiations. Practically speaking though, if you are already not sharing state between requests this is just not a huge issue.
Sometimes it feels like we're comparing "simple program written in CGI where it isn't a big deal if it fails or has some bugs" to "complex program written using a FastCGI or HTTP server where it is a big deal if it leaks a string between users".
> As opposed to? All processes have a working directory. What problems come from using the file system?
The problem isn't the working directory, it's the fact that anything in a cgi-bin directory 1. will be exec'd if it can be 2. exists under the document root, which the webserver typically has privileges to write to.
> Yes it’s the same amount of effort to configure this.
I actually really didn't read this before writing out how easy it was to use these with systemd, so I guess refer to the point above.
I don’t see why having a long running process gives you more facilities for process isolation. Every tool you want to use is available with fire and forget processes in a cgi config.
It sounds like you’re a container guy But if you want more process isolation great! So now you agree it’s a concern to share requests in a process if you are concerned about these other cases?
So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright. I mean look, everyone makes mistakes, but if you are routinely running into problems shipping basic FastCGI or HTTP servers in the modern era you really need to introspect what's going wrong. I am no stranger to writing one-off Go servers for things and this is not a serious concern.
Plus, realistically, this only gives a little bit of insulation anyway. You can definitely still write CGI scripts that explode violently if you want to. The only way you can really prevent that is by having complete isolation between processes, which is not something you traditionally do with CGI.
> It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.
What information does this leak, and why should I be concerned?
> Or you know not every site is about scaling requests. It’s another way you can simplify.
> > but it is an outdated execution model
> Not an argument.
Correct. That's not the argument, it's the conclusion.
For some reason you ignored the imperative parts,
> It's cool that you can fork+exec 5000 times per second, but if you don't have to, isn't that significantly better?
> Plus, with FastCGI, it's trivial to have separate privileges for the application server and the webserver.
> [Having] the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.
Those are the primary reasons why I believe the CGI model of execution is outdated.
> The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.
CGI is in the opposite direction, though. With CGI, the default behavior is that your CGI process is going to run with similar privileges to the web server itself, under the same user. On a modern Linux server it's relatively easy to set up a separate user with more specifically-tuned privileges and with various isolation options and resource limits (e.g. cgroups.)