Why Attacking Application Exceptions is Important

jtbigwoo · on Jan 4, 2012

Many development groups treat fixing minor exceptions as a once-in-a-blue-moon event. Sometimes it's just a matter of making exceptions more visible to the developers on a day-to-day basis.

Several years ago I inherited an application that averaged ten customer-discovered problems (with only a few hundred customers) and a system outage each quarter. The logging system was set to email all exceptions to the developers (there were a bunch of low-level exceptions every day.) The outgoing lead developer had an outlook rule that funneled all logging email messages to the junk mail folder; she suggested I do the same. I never did and the annoyance of all those messages in my inbox did wonders for my motivation. After two years, the system virtually never went down and we had almost no customer complaints.

dennisgorelik · on Jan 5, 2012

> outlook rule that funneled all logging email messages to the junk mail folder

That's a telling sign of incapable software developer.

gordonguthrie · on Jan 4, 2012

On of the greatest things about Erlang is that stability is orthogonal to correctness.

Essentially you have a process that does something (a worker) and a second process that will be informed if the first process dies (a supervisor).

With that core construct you build a supervisor tree which restarts subsystems (using OTP).

You can have every worker process crash out (ie no correctness) and have macro stability (the supervision tree restarts worker processes correctly).

The consequence of this is that you only code the happy path . Try/catches are very rare (typically at hypernumbers we wrap end-user input with try/catch and only have one try/catch per 25kloc or so).

The VM logs all these crashes. You check 'em out and fix them.

Happy path programming is fantastic. If I use try and use your library in ways you didn't expect it will crash (and tell me that our expectations are out of whack).

Every try/catch that you write wraps up and hides low-level bugs which you never fix. In languages where crashed bubble up the call stack there is a disjoint between correctness and stability - you need to tolerate a certain degree of errors to get stability.

Turns out actually fixing them is good.

luser001 · on Jan 4, 2012

Why are so sure that merely restarting the crashing process and retrying the operation will make it work correctly the second time?

Did I missing something in your explanation?

gordonguthrie · on Jan 5, 2012

It doesn't work the second time. It does capture the bug and mark it though. The system is stable and up - it just doesn't work.

teaspoon · on Jan 4, 2012

Couldn't a system with N supervisor-actor relationships be rewritten as a single process with N try/catch statements, where the catch block simply logs the exception and jumps back to the start of the try block?

I haven't used Erlang, so the two patterns seem roughly equivalent. I've even heard exception handling referred to as "happy path programming" by way of contrast to return value handling.