> Think about all the problems associated with process life cycle - is a process...

taeric · 2025-07-09T14:29:44 1752071384

I'd push back on some of this. Specifically, the memory management that is somewhat inherent to how a CGI script works is typically easier to manage than longer life cycle things. You just tear down the entire process; instead of having to carefully tear down each thing created during the process.

Sure, it is easy to view this as the process being somewhat sloppy with regards to how it did memory. But it can also be seen as just less work. If you can toss the entire allocated range of memory, what benefit is there to carefully walking back each allocated structure? (Notably, arenas and such are efforts to get this kind of behavior in longer lived processes.)

jchw · 2025-07-09T14:45:59 1752072359

True, it is simpler to just simply never free memory and let process teardown take care of it, but I'm only disagreeing with the notion that it's non-trivial to write servers that simply don't leak memory per-request. I think with modern tools, it's pretty easy for anyone to accomplish. Hell, if you can just slap Boehm GC into your C program, maybe it's trivial to accomplish with old tools, too.

taeric · 2025-07-09T14:59:18 1752073158

Fair. My push was less on just not leaking memory entirely, and more that it can scale faster. Both using a GC and relying on teardown are essentially punting the problem from the specific request handling code onto something else. It was not uncommon to see GC based systems fall behind under load. Specifically because their task was more work than tearing down a process.

monkeyelite · 2025-07-09T14:09:57 1752070197

> So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright

Yes. The code is already shitty. That’s life. Let’s make the system more reliable and fault tolerant.

This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

But let me add another reason with your framing. In fire/forget programmers get used to crashing intentionally at the first sign of trouble. This makes it easy to detect failures and improve code. The incentive for long running processes is to avoid crashing, so programs get into bad states instead.

> The only way you can really prevent that is by having complete isolation between processes

Yes. That’s the idea. Separate memory spaces.

> What information does this leak

Anything that might be in a resource, or memory. Or even in the resource of a library.

> and why should I be concerned

Accessing leaked information form a prior run is a common attack.

> but if you don't have to, isn't that significantly better?

Long running processes are inherently more complex. The only benefit is performance.

> H’the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

As opposed to? All processes have a working directory. What problems come from using the file system?

> cgroups

Yes it’s the same amount of effort to configure.

jchw · 2025-07-09T14:34:15 1752071655

> Yes. The code is already shitty. That’s life. Let’s make the system more reliable so that small code mistakes are disasters.

> This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

This is not a "Simply don't make mistakes" type of argument, it's more like a "We've moved past this problem" type of argument. The choice of garbage collection as an example is a little funny, because actually I'd argue heavily in favor of using garbage collection if you're not latency-sensitive; after all, like I said, I use Go for a lot of one-off servers.

It'd be one thing if every person had to sit there and solve again the basic problems behind writing an HTTP server, but you don't anymore. Many modern platforms put a perfectly stable HTTP server right in the standard library, freeing you from even needing to install more dependencies to be able to handle HTTP requests effectively.

> > The only way you can really prevent that is by having complete isolation between processes

> Yes. That’s the idea. Web server forks, and execs. Separate memory spaces.

That's not complete isolation between processes. You can still starve the CPU or RAM, get into contention over global locks (e.g. sqlite database), do conflicting file I/O inside the same namespace. I can go on but the point is that I don't consider two processes running on the same machine to be "isolated" with each-other. ("Process isolation" is typically used to talk about isolation between processes, not isolation of workloads into processes.) If you do it badly, you can wind up with requests that sporadically fail or hang. If you do it worse, you can wind up with corruption/interleaving writes/etc.

Meanwhile, if you're running a typical Linux distro with systemd, you can slap cgroups and namespacing onto your service with the triviality of slapping some options into an INI file. (And if you're not because you hate systemd, well, all of the features are still there, you just may need to do more work to use them.)

> > What information does this leak

> Anything that might be in a resource, or memory. Or even in the resource of a library you use.

> > and why should I be concerned

> Accessing leaked information form a prior run is a common attack.

I will grant you that you can't help it if one of your dependencies (or God help you, the standard library/runtime of your programming language) is buggy and leaks global state between instantiations. Practically speaking though, if you are already not sharing state between requests this is just not a huge issue.

Sometimes it feels like we're comparing "simple program written in CGI where it isn't a big deal if it fails or has some bugs" to "complex program written using a FastCGI or HTTP server where it is a big deal if it leaks a string between users".

> As opposed to? All processes have a working directory. What problems come from using the file system?

The problem isn't the working directory, it's the fact that anything in a cgi-bin directory 1. will be exec'd if it can be 2. exists under the document root, which the webserver typically has privileges to write to.

> Yes it’s the same amount of effort to configure this.

I actually really didn't read this before writing out how easy it was to use these with systemd, so I guess refer to the point above.

monkeyelite · 2025-07-09T20:17:53 1752092273

I don’t see why having a long running process gives you more facilities for process isolation. Every tool you want to use is available with fire and forget processes in a cgi config.

It sounds like you’re a container guy But if you want more process isolation great! So now you agree it’s a concern to share requests in a process if you are concerned about these other cases?