One of my favourites that I ever heard about was from a friend of mine who used to work on safety-critical systems in defense applications. He told me fighter jets have a safety system that disables the weapons systems if a (weight) load is detected on the landing gear so that if the plane is on the ground and the pilot bumps the wrong button they don't accidentally blow up their own airbase[1]. So anyway when the Eurofighter Typhoon did its first live weapons test the weapons failed to launch. When they did the RCA they found something like[2]
bool check_the_landing_gear_before_shootyshoot(double weight_from_sensor, double threshold) {
//FIXME: Remember to implement this before we go live
return false
}
So when the pilot pressed the button the function disabled the weapons as if the plane had been on the ground. Because the "correctness" checks were against the Z spec and this function didn't have a unit test because it was deemed too trivial, the problem wasn't found before launch, so this cost several millions to redeploy the (one-line) fix to actually check the weight from the sensor was less than the threshold.
[1] Yes this means that scene from the cheesy action movie (can't remember which one) where Arnold Schwartzenegger finds himself on the ground in the cockpit of a russian plane and proceeds to blow up all the badguys while on the ground couldn't happen in real life.
[2] Not the actual code which was in some weird version of ADA apparently.
Torpedoes typically have an inertial switch which disarms them if they turn 180 degrees, so they don't accidentally hit their source. When a torpedo accidentally arms and activates on board a submarine (hot running) the emergency procedure is to immediately turn the sub around 180 degrees to disarm the torpedo.
My understanding is the BeuOrd (or BeuShip? I don't remember which) "didn't want to waste money on testing it", so instead we wasted hundreds of them fired at japanese shipping that didn't even impact their target, or never had a hope of detonating.
Remember these kind of things next time someone pushes for move fast and break things in the name of efficiency and speed. Slow is fast.
Pre-war, it was more a case of "Penny wise and Pound foolish" partly due to budget limitation (they did things like testing only with foam warheads to recover test torpedoes).
But after Perl Harbor, a somewhat biased BuOrd was reluctant to admit the mark 14 flaws. It took a few "unauthorized" tests and 2 years to fix the issues.
In fairness, this sure makes for an entertaining story (ex Drachinifel video on yt), but I'm not completely sold on the depiction of BuOrd as some sort of arrogant bureaucrats. However, bias and pride (plus other issues like low production) certainly have played a role in the early mark 14 debacle.
Going back to software development, I'm always amazed how bugs immediately pop-up whenever I put a piece of software in the hands of users for the first time, and that's regardless how well I tested it.
I try to be as thorough as possible, but being the developer I'm always bias, often tunnel visioning on one way to use the software I created. That's why, in my opinion you need some form of external QA/testing (like these "unauthorized" Mark 14 tests).
Aah thank you on both counts yes. One interesting feature he told me about is they wrote a "reverse compiler" that would take Spark code and turn it into the associated formal (Z) spec so they could compare that to the actual Z spec to prove they were the same. Kind of nifty.
> Yes this means that scene from the cheesy action movie (can't remember which one) where Arnold Schwartzenegger finds himself on the ground in the cockpit of a russian plane and proceeds to blow up all the badguys while on the ground couldn't happen in real life.
I think you meant "Tomorrow Never Dies" and the actor was Pierce Brosnan. Took me forever to find that, is it the right one?
Oh, that's also the sequence with the "peace / war" switch! That did make me laugh. Turns out it's a real thing, though - but they probably wouldn't have flipped it in this situation.
It really threw me because I also remembered the scene but not the actor so I kept looking for Schwarzenegger movies. Good, checked that one off the list :)
> this cost several millions to redeploy the (one-line) fix to actually check the weight from the sensor was less than the threshold
Well maybe this is the other, compounding problem. Engineering complex machines with such a high cost of bugfix deployment seems like a big issue. It's funny that as an industry we now know how to safely deploy software updates to hundreds of millions of phones, with security checks, signed firmwares, etc, but doing that on applications with a super high unit price tag seems out of reach...
Or maybe, a few millions is like only a thousand hours of flying in jet fuel costs alone, not a big deal...
>It's funny that as an industry we now know how to safely deploy software updates to hundreds of millions of phones, with security checks, signed firmwares, etc, but doing that on applications with a super high unit price tag seems out of reach
A bunch of JavaScript dudebros yeeting code out into the ether is not at all comparable to deploying avionics software to a fighter jet. Give your head a shake.
I don't think they're referring to dudebros' js, they're referring to systems software and the ability to deliver relatively secure updates over insecure channels. I've even delivered a signed firmware update to a microprocessor in a goddamn washing machine over UART. Why can't we do this for a jet?
Well, because the software load of an aircraft is certified as part of the approved type design, for one. If you update the software it requires an engineering approval, because the risks inherent to operating an aircraft and the engineering that goes into mitigating those risks and making them acceptably safe are quite a bit more significant than a washing machine.
What's more we're talking about stores clearance (i.e. releasing shit from the aircraft in flight).
The attitude behind "Just write that function and flash the firmware" gets people killed.
I'm not saying just write the function and flash the firmware, but it's not like the super rigid certification process doesn't have its nefarious side effects either. My experience is that the more expensive fixes are, the more humans are willing to turn a blind eye to problems or wish them away.
>but it's not like the super rigid certification process doesn't have its nefarious side effects either.
The system isn't rigid so much as thorough. You can omit portions of the review for Minor Changes (term of art), for example. Unfortunately "writing the code to correctly release deadly explosives from the aircraft in flight" is far from a Minor Change, so gee willikers I guess it required some due diligence.
Maybe sometimes doing things correctly takes time and money for a reason, even if the reason isn't obvious. Maybe there's a good reason not to have OTA firmware update capability on a warplane.
I hear you. I respect the process and practice. I would invite you to ponder what would happen if all the iPhones and iPads in one country were to be bricked overnight by an OTA update - billions of dollars worth of instant economic damage just from the device cost, many billions more in consequences including lives lost. Well this capability probably exists somewhere at Apple. I hope it's well guarded by process and perhaps this process is costly not unlike recertification. Does the ability to deliver critical fixes quickly make it a safer system on balance, versus the Nokia and BlackBerry era where your firmware essentially never changed ever because the cost of delivery was so high? My guess it that it does on balance represent an improvement. But maybe I misunderstood and the millions of dollars in cost of delivering the fix were actually spent on due diligence, as opposed to just mechanically applying the patch.
Many people will hear the usual story of the fixes (for the plane example) being enormously expensive without really diving into what all goes into that figure.
The source code change itself may be trivial, so it's easy to compare that to the multi-million dollar figures thrown around and have criticisms.
We can do OTA updates, there's no technical reason it can't be done other than not allowing it (which I mostly agree with in secure applications). Hell our spacecraft do this now.
We must keep in mind these fixes do not go from dev environment straight to the field (prod), which would be a terrible idea. These are extremely complex integrated systems and must be tested in multiple phases because let's face it, if this supposedly trivial issue made it all the way through, what else may not have been discovered yet?
Not only does the 'easy' fix need to be tested (time and money), but related interactions need to be investigated as well (more time and money). The time cost of people doing the work, investigations, testing adds up from all this. Then there's potentially hardware in the mix which is never cheap, also simply being able to get access to hardware for testing can be a huge hassle.
Keep in mind this comment is only geared towards situations where the end item is a physical system. I would expect a fixing a pure software product to have significantly lower costs.
We don’t really know the context of this anecdote, but if you have to completely re-run your test plan on a real plane with real munitions for newly deployed software, which is a pretty good idea, then I could see it costing millions, even if the fix deployed in a minute.
This makes no sense and is difficult to even respond to coherently.
> It's funny that as an industry we now know how to safely deploy software updates to hundreds of millions of phones, with security checks, signed firmwares, etc,
Either you're completely wrong, because we "as an industry" still push bugs and security flaws, or you're comparing two completely different things.
> doing that on applications with a super high unit price tag seems out of reach...
is true because of
> a few millions is like only a thousand hours of flying in jet fuel costs alone
like do you really think they spent millions pushing a line of code? or do you think it's just inherently expensive to fly a jet, and so doing it twice costs more?
I would generally pass this comment by, but it's just so distastefully hostile because you totally missed the point.
GP's comment was expressing sardonic disbelief that a modern jet wouldn't be able to receive remote software updates, considering it's so ubiquitous and reliable in other fields, even those with much, much lower costs. Not that developers don't release faults.
People tend to opine on systems engineering as if we had some sort of information superconductor connecting all minds involved.
Systems are Hard and complex systems are Harder. Thinking of entire class of failures as 'solved' is kinda like talking about curing cancer. There isn't one thing called cancer, there's hundreds.
There's no way to solve complex systems problems for good. Reality, technologies, tooling, people, language, everything changes all the time. And complex systems failure modes that happen today will happen forever.
Ahh, then I did misread it entirely. Thanks for stopping by to call me out.
It's still probably not a matter of capability... I wouldn't be so cavalier about software updates on my phone if it was holding me thousands of feet above the ground at the time.
I already commented on this elsewhere but I came across a company that did OTA updates on a control box in vehicles without checking if the vehicle was in motion or not. And it didn't even really surprise me, it was just one of those things that came up when prepping for that job from a risk assessment. They never even thought of it.
> Or maybe, a few millions is like only a thousand hours of flying in jet fuel costs alone, not a big deal...
Pretty much tbh. For example, the development of the Saab JAS 39 Gripen (JAS-projektet) is the most expensive industrial project in modern Swedish history at a cost of 120+ billion SEK (11+ billion USD).
It was also almost cancelled after a very public crash in central Stockholm at the 1993 Stockholm Water Festival [1]. A crash that should not have happened because the flight should not have been approved in the first place, because they weren't yet confident that they'd completely solved the Pilot-Induced Oscillation (PIO) related issues that wrecked the first prototype 4 years prior (with the same test pilot) [2].
It was basically a miracle that no one was killed or seriously hurt in the Stockholm crash, had the plane hit the nearby bridge or any of the other densely crowded areas then it would've been a very different story.
a few million dollars works out to a surprisingly small amount of time when you add overhead.
Call the bug fix a development team of 20 people taking 3 months end to end from bug discovery to fix deployment. You'll probably have twice that much people time again in project management and communication overhead (1:2 ratio of dev time to communication overhead is actually amazing in defense contexts). Assume total cost per person of 200k per year (after factoring in benefits, overhead, and pork), so 60 people * 3 months * $200k/12 months = 3,000,000 USD.
> if a (weight) load is detected on the landing gear
This state "weight on wheels" is used in a lot of other functionality, not just on military aircraft, as the hard stop for things that don't make sense if we're not airborne. So that makes sense (albeit obviously somebody needed to actually write this function)
Most obviously the gear retraction is disabled on planes which have retractable landing gear.
Yup, watch videos of actual missile launches--the missile descends below the plane that fired it. Can't do that on the ground, although you won't blow up your base because the weapon will not have armed by the time it goes splat.
True Lies was an airborne Harrier, not a Russian plane on the ground. So, while there are many reasons the scene was unrealistic, “weight on gear disables weapons” isn’t one.
I know, such a great movie, total classic of my childhood! It was the only R-rated film my mother ever allowed and even endorsed us watching, "Because Jamie Lee Curtis is hot." :D
I wasn't sure if there might've been a scene I'd forgotten.
[1] Yes this means that scene from the cheesy action movie (can't remember which one) where Arnold Schwartzenegger finds himself on the ground in the cockpit of a russian plane and proceeds to blow up all the badguys while on the ground couldn't happen in real life.
[2] Not the actual code which was in some weird version of ADA apparently.