Just one of the risks of automation, and a good reminder why human monitoring is...

fiatmoney · on Oct 22, 2013

You should read the linked PDF - they had systems that were 100% dependent on human monitoring, that no one was checking, or where no one recognized anything unusual. If anything, their failures were due to massive lack of automation in deployment, testing, and monitoring.

malbs · on Oct 22, 2013

Oh the other amazing thing we used to have - a system that generated so many alerts that everyone just created outlook rules to automatically delete all the alerts. I mean really? I wondered if we should modify the system to only generate messages when there are actual exceptional cases? People looked at me like I was an idiot

rmc · on Oct 22, 2013

Totally agree. You should not have too much noise.

malbs · on Oct 22, 2013

Yeah ours was monitoring just rubber stamped it. Afterwards everyone remarked, everyone could tell they were bad just looking at what was in front of them, our theory was she was too busy watching the breaking bad final episode or something.

jrochkind1 · on Oct 22, 2013

You know how if the first 100 times a dialog box comes up, the correct response is to click 'ok', then people start just clicking 'ok' on every dialog box, and then the 201st one comes up "Destroy everything? [cancel] [ok], and they click 'ok' too, and don't think anything of it?

mprovost · on Oct 22, 2013

It's like how they keep the TSA x-ray readers at the airport from becoming too complacent, every once in a while you have to slip a gun through the scanner and see if they catch it. Otherwise you're just looking for (presumably) rare events and you become numb to the never-ending stream.

Dylan16807 · on Oct 22, 2013

Sure, you click reflexively, but you should notice the text was bad either after clicking or within a couple more clicks. Letting the first few errors through is reasonable, but letting a wall of them through without noticing anything wrong, when reading them is your job, is inexcusable.

jrochkind1 · on Oct 22, 2013

I don't know if it's excusable or not, but it may be incompatible with typical human cognition to expect someone to be able to do that.

Maybe you have to figure out a way to test people for unusually high aptitude at looking at mind-numbingly dull repetitive things over and over again, but then still being able to notice the aberrant ones. And then only put people in that job with unusually high aptitude there.

Or have people only do pretty short shifts at that task.

I'm pretty confident that this person wasn't unusually negligent, if you have most anyone doing that job hour after hour day after day they will lose the ability to flag the aberant stuff.

comrade_ogilvy · on Oct 22, 2013

Yours is the correct viewpoint: it is incompatible with human cognition.

If an alert system is not perceived as highly reliable in directing positive action, then the humans involved will inevitably disable the alert system, either by pulling out a screwdriver or rewriting their mental rubrics to ignore the messages as noise.

Knight Capital is just the finance version of Three Mile Island and Deepwater Horizon -- the means to mitigate or prevent disaster were on hand, but the people in charge just dithered by the kill switch because they were confused. Well, if the people in charge are confused, that is a reason to start the emergency procedures.

jrochkind1 · on Oct 23, 2013

in the ancestral comment here, it wasn't even an alert system! It was just "your job is to sit there all day and watch every single transaction and flag the aberrant one"!!

Dylan16807 · on Oct 23, 2013

"Whoa this transaction is way bigger than normal. So is this one. And this one."

People ignore repetitive things, but they usually notice when it changes. They can tell you that it's shaped different or explain how it sounds different from normal.

If this system made dissimilar transactions look very similar to the monitor, then it is to blame, not the idea of having a monitor at all.

4hthth4 · on Oct 22, 2013

It makes me think you shouldn't allow only a single person to monitor something like that...

drill_sarge · on Oct 22, 2013

Yes, the more people monitoring the better. But in this situation someone has to be in charge and make a decision. And nobody wants to be that guy.

comrade_ogilvy · on Oct 22, 2013

I would argue that many more people monitoring can encourage the fear of making the wrong call. "Hey, someone smarter and more experienced than me should makes sense of this." "Hey, the smart new guy is supposed to be watching this. I will look more closely later."

What you want is two or three people really in charge, where the individuals are empowered to say: "I am totally confused. If someone cannot explain what is going on to me so that I understand, I am starting shutdown procedures, immediately. Do YOU know exactly what is going on?"

moath · on Oct 22, 2013

I'm not seeing anything here that makes me think they had any kind of automation at all! In my experience, if they had automation things would have consistently failed on all servers the deployment was executed on.

That said, if you're going to fly the jet liner in full manual mode, you better make sure your co-pilot is watching the instruments.

SideburnsOfDoom · on Oct 22, 2013

The parent post may be pointing out that the point of this software is to automate trading on the stock market. It's risky if your software testing, rollout, monitoring and rollback process is not sufficiently automated. This second kind of automation is the kind that you or I are most likely more familiar with. And was lacking.

SideburnsOfDoom · on Oct 22, 2013

> Just one of the risks of automation

Automated deployments would have helped them. They made lots of mistakes, but IMHO not automating the deploy was likely the #1 mistake here.