This is the reason i am big fan of running any software with separate users and ...

ams6110 · on March 31, 2018

This is the approach I take also. I'm also looking at totally disabling the OOM killer because it seems to be pretty useless. Anytime I see stuff killed by OOM the culprit is usually and obviously some runaway Java process, but OOM inevitably picks the SSH daemon to kill, which doesn't help anything, and the box continues to swap so badly that it just seems unrecoverable. I'd rather just have the box panic and reboot if it's truly out of memory.

yjftsjthsd-h · on March 31, 2018

I have not looked into it at all, but can you not exempt sshd from the OOM killer?

ams6110 · on March 31, 2018

I looked into it a little bit. There are ways to tune it but I didn't see a way to exempt processes by name. It may be possible.

The scenario I described above is HPC clusters in a university environment. The problem is students running programs that are poorly written. I'd rather reboot the node and tell them to fix their code than deal with trying to accommodate their careless / naive programming.