Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the new capacity had caused all of the servers in the fleet to exceed the maximum number of threads allowed by an operating system configuration. [...] We didn’t want to increase the operating system limit without further testing

Is it because operating system configuration is managed by a different team within the organization?



Nope. It's just a case of "stop the bleeding before starting the surgery."


More likely they need to understand what effect changing the thread limit would have - for example it could increase kernel memory usage or increase scheduler latency. It’s not something you want to mess with in an outage.


I’ve heard AWS follows a you build it, you run it policy, so that seems unlikely. Just seems prudent to not mess with OS settings in a hurry.


If you start haphazardly changing things while firefighting without testing, you might make things even worse. And there's worse things than downtime, for instance if the system appears to work but you're actually silently corrupting customer data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: