Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Parts installed “upside down” caused Russian rocket to explode last week (arstechnica.com)
44 points by moubarak on July 9, 2013 | hide | past | favorite | 34 comments


"young technician"

I sure hope he doesn't end up being both a scapegoat and deemed a criminal. Problems like this are exactly why you let younger technicians be mentored by older (and ostensibly wiser) ones. It's also why you have visual inspections done by a different party.


And why you design such parts so they cannot be physically installed upside down. http://en.wikipedia.org/wiki/Poka-yoke


It's also why you design the parts so that they are impossible to install upside down. No matter how clearly things are labelled, even the most experienced people will make mistakes.


The shuttle had a similar problem with a large piece of round metal that had more than one way to install of which only one was right. The mechanics knew this and asked for some markers on the piece to ease the process. The markers were never added not because of the cost of four paint points but because of the documentation that had to be updated.

So there is your problem - bureaucracy. I'm sure that an engineer knew about the sensor mounting problem but nobody cared enough to make the change because of the paperwork.


Or why this goes into configuration management and is verified before shipping. Seems not just the "young technician" failed but the process failed to catch the defect.


As I've gotten more experienced, I've found it hard to work with people who don't optimize for safety and efficency...not because I'm a worrywart, but because experience had shown that fucking up (royally) is much less likely caused by lack of skill, industriousness, or moral character, than by unforeseen weaknesses, stress, and cascading errors. Virtually every ER surgeon has spent more energy and time into their work than I have in mine, and every day they deal with life and death, and yet failing to wash their hands is not a rare occurrence...why should I think I'm any more foolproof?


This is an illustration of Murphy's Law, almost perfectly:

"If there's more than one way to do a job, and one of those ways will result in disaster, then he will do it that way."

Defensive design would be to make sure, as others have state on this thread, that it's impossible to make a mistake (e.g. an asymmetrical mount for the part).


And Murphy coined his Law after being screwed by a part that was installed upside down!


If something wrong happens, don't punish the one who did it wrong - work out why the hell it was possible to do it wrong in the first place and solve that.


http://www.youtube.com/watch?v=ipNjYMJo79o#t=77s -- ignition begins at about 1:20


And now we know why they insist on such large perimeters. Imagine being on the side it was tipping toward.



Note to self: when you see the rocket crash with your eyes, count to 5 seconds then cover your ears.


This is why (in C code) I prefer what some people call "yoda conditions"[1]. It prevents problems where you accidentally assign when you mean to check equality. Like mentioned above, you should engineer everything, software and hardware, defensively. If the design simply forbade installing a part upside down, it would make up for the pathetic QA/QE which caused this issue in the first place.

[1] http://en.m.wikipedia.org/wiki/Yoda_Conditions


Clang (and GCC 4.7, I believe) would issue a warning for that. Instead of Yoda Conditions, it's recommended to use -Werror.


GCC has had this warning for a long time. 4.2 certainly does it, and I'm fairly sure I've seen it since at least the 3.x days.


Microsoft C/C++ compilers also pick this up as a warning if the warnings settings are set to max.


This is one (of many) reasons I have a dislike for Java:

    String myString = null;
    if (myString.equals("foobar")) { /* ... */ }
    // This causes a NullPointerException in Java
Here is some equivalent C# code:

    using System;
    
    namespace SampleApplication
    {
        static class Program
        {
            static void Main(string[] args) {
                String myString = null;
    
                if (myString == "foobar") {
                    Console.WriteLine("???");
                }
                else {
                    Console.WriteLine("No good!!");
                }
            }
        }
    }
and sure enough it produces what I consider the correct logical output:

    No good!!


[deleted]


No, jussij is right. It is your code which is not equivalent.

Your Java code is checking reference equality while the C# code is checking string value equality. The == operator works differently between strings in C# compared to Java. The only reason the Java code you wrote might "work" is due to Java interning strings so the references could be the same.


With java 7 and java.util.Objects there is a static helper method. Nice to have in the favourites in Eclipse.

    import static java.util.Objects.equals;

    if(equals(myString, "foobar")) { ... }


So rather than fix the obvious problems in the language they resort to a standard library work around?

I'm an experienced programmer and in my time I've only ever worked two big Java projects. Those two projects left me feeling like an idiot!

Not long after I started work on my first C# project and while it felt a lot like Java, it was clear the designers of that language had created a Java like language that was so much better than Java. It just felt right!

So is C# better than Java?

Based on my experience with both languages I would say yes.

Does my opinion matter. No!

But to me Java had the lead and it let it go.

With the lead gone it is now trying desperately to play catch up to C# which is progressing leaps and bounds.



A similar issue was responsible for one of the V22 crashes ([1], page 4). In that case the gyro wires were hooked up backwards, instead of the unit installed backward. Both cases are good examples of small design details that make or break a project.

[1] - http://www.fas.org/man/dod-101/sys/ac/v22-report.pdf


This reminds me of a comment under the Asiana Flight 214 discussion at https://news.ycombinator.com/item?id=6012214, that outsider views can help. In this case, as in the case of the Mars Climate Orbiter (http://en.wikipedia.org/wiki/Mars_Climate_Orbiter), perhaps review of the work by someone who hasn't been scrutinizing it all day would have caught the errors? You stare at the same thing all day, and it's easy to get lost/not see your own mistakes. This can apply to supervisors, who might be inclined not to review their supervisees' work carefully because they always do such good work that it really doesn't occur to them that something this big could go wrong. And at the same time, they may well have been staring at it too long themselves.

And as an aside, isn't the Russian space program always "beleaguered"?


>And as an aside, isn't the Russian space program always "beleaguered"?

a decade old story about one of the unsuccessful launches goes this way - "the salary wasn't paid for several month, so not much work was done, once the launch date came within a month the government finally paid the salary and the satellite was built really fast, though some testing and some other things didn't really make it ..."


Lest you think this sort of thing is something that could only happen to the Ruskies, consider that the same sort of mistake has happened on NASA space probes, twice.

The Galileo Jupiter atmospheric probe deployed its parachutes late because the accelerometer designed to trigger the pyro charges to release the parachutes was wired backwards. There were two different "g-switches" which provided acceleration data at different scales. When the probe entered Jupiter's atmosphere the software control system saw that the acceleration data it was getting was all wrong, it marked one of the channels as failed and attempted to use just the other channel, though the scale was all wrong so the data was bogus (leading the software to think that the probe was in a different stage of its entry than it actually was). The result was that the main parachute was deployed very late and the only reason the parachute successfully deployed at all at the speed and atmospheric density it was released into is by more or less sheer luck.

The same mis-wiring happened on the Genesis sample return probe, which caused it to collide with the ground rather than be captured in mid-air as per the plan.

The funny part is that the accelerometers on both probes were tested in a centrifuge, but the wiring harnesses for the tests were also wired backwards as well.


The lack of integrated testing was by far the most impressive thing about the Curiosity Rover/MSL landing to me. Most people who followed the news had no idea that the skycrane deployment setup was never actually tested in full. They validated components, like the hydrazine engines, radar, and bridle release mechanism, but not the combination of all.

Software and circuits scale through abstraction and the static discipline. But combining atoms just gets harder and harder.

(This sort of thing also happens when constructing buildings or bridges, but there's usually not a million different points for potential catastrophic failure.)


"each of those sensors had an arrow that was suppose to point towards the top of the vehicle"

It's a little unbelievable to me that aerospace designers today would design a part who's only means of ensuring proper installation and alignment is a sticker with an arrow.

If it was truly the case the error was one of engineering and not the fault of the technician who installed the parts. Engineers working on such projects should not make these kinds of mistakes.

As an EE I had to learn mechanical design on my own. I quickly learned that it is paramount to have DFM (design for manufacturing) become a part of your design DNA. You need to think DFM for every little part you design. You need to think assembly for every little part you are designing. You need to ensure that parts or assemblies can be put together almost by a blind person and do so correctly.

The tools available are simple: Alignment pins, unequally spaced holes, alignment tabs or machined features, unidirectional mating or clamping methods, non-rotatable insertion of sliding elements (d-pin into d-hole), etc.

The same is true of EE design. Don't make a bunch of connectors exactly the same unless swapping the cables or daughterboards that mate with them is OK. One common technique is to use different pin count connectors. If a connector is reversible (such as a pin header used for, say, an RC servo) you need to make sure that reversing whatever mates to it will not cause damage. A more sophisticated approach, when it can be justified, is to make smart connectors that can deal with reversals by actively flipping signals around as required --common on ethernet switches and routers that can deal with straight and flipped signal pair cables.

Again, I find it hard to believe that aerospace engineers would make such a basic error. You never know.


> Engineers working on such projects should not make these kinds of mistakes.

Sounds like a plan. In fact, it's inspired me in my own professional life. Henceforth, I shall never write another software bug again.

The problem with the idea of this kind of thought is that it's turtles all the way down. You have an assembly mistake that could have been prevented by engineering. But of course now it's an engineering mistake. And you can't simply legislate the absence of mistakes. You can try to treat that with "engineering process", I guess, which mandates the use of funny gadgets or asymmetry or automated software checkers or whatever. And that probably helps. But then you can apply that process mistakenly...

At some point you have to cut bait and ship (or launch, rather).


The idea of DFM is that any reasonably skilled engineer creates a design such that any reasonably skilled manufacturing/assembly shop can put it together.

Without following the practices of DFM, it is more likely than not that some part of the assembly will be done incorrectly, even if the assembly team was highly skilled and executing at a high level of competence.

With DFM, a moderately skilled EE, and a moderately skilled assembly team, will correctly manufacture the designated system.


True, but missing the point. DFM can be misapplied, as it was here. So now you need to propose a process (design review, I guess) that prevents DFM from being misapplied. And that process can break down...

The fact that you can look at the wreckage or a rocket and "see" what the problem was in hindsight is not proof that you know how to prevent any such accident. I really with more design people understood that.

(It doesn't mean that whatever process you favor is a bad idea either, but that it has limits. Getting to my original point: you can't fix bugs by executive fiat.)


> Sounds like a plan. In fact, it's inspired me in my own professional life. Henceforth, I shall never write another software bug again.

Let me suggest that your sarcasm is misplaced. My guess is that you have never worked on a multidisciplinary product that, in the event of failure, can kill people or one that has to be manufactured in thousands to millions of units per year. Not to minimize your experience, but software engineering is vastly different from designing, assembling and testing complex electromechanical systems.

I am speaking from the vantage point of having extensive experience engineering all aspects of multidisciplinary products during my career. From raw sheet metal through machined parts and injection molding. From analog electronics design to complex multi-gigahertz FPGA's. And, of course, software in embedded, mobile, web, custom real-time OS and workstation. Let's just say I've made enough mistakes in each field to have a reasonable understanding of their respective domains.

Also worth noting: DFM is only ONE of the the long list of relevant disciplines at play. The fact that I focused on DFM on my prior post does not mean that DFM alone is the solution to this problem.

Software is different in a number of ways. For one thing, in the software world you don't hand pieces of the product to a non-programming workforce for final assembly. The analogy here would be that you write code that instantiates a thousand different independent objects and then hand those over to an assembly team to "wire" together and manufacture the final product. That, of course, isn't the way it is done. In software development the product is typically designed, engineered, assembled and tested by one or many software engineers. In some cases, such as games, a multidisciplinary team assembles and tests the product. However, at no time are people unskilled in software engineering touching the very code that makes the product work. With electromechanical products your assembly crew quite literally has their hands in the guts of the product. Very different scenarios.

It might be a good mental exercise to think of the hypothetical scenario I painted. Imagine your job was to write code that instantiated a complex object and this object was to be integrated into a finished product by a technician who is not a software engineer. Would you blame the technician if the object was wired into the product with method arguments flipped around and property assignments inverted? You should not. In such a ridiculous hypothetical case it would be your job to ensure that this mythical object cannot be integrated into the greater operating code but one way and that all other options are covered and detected during testing.

Of course this is an imperfect and ridiculous analogy, don't waste any time lost in the minutiae to dissect it. The point is to highlight that software development cannot be directly compared to the process of designing and manufacturing a complex multidisciplinary product.

In typical large scale electromechanical projects from a toaster to a TV set, a car or a rocket there are have dozens to thousands of people involved, each with their own domain. For example, one of my acquaintances was the chief engineer for the F-18 fighter project. He had 3,000 engineers working under him. At the other extreme, I know multiple one or two person teams.

In most cases manufacturing is a discipline in and of itself, one where process and tooling are also engineering projects. If you've watched any of the "How it's made" shows you have probably seen how much special-purpose (and sometimes unique and custom) equipment is required for even the seemingly simple products. Example: Aircraft, mechanical and electronic engineers design a wing. Manufacturing engineers design the fixtures and tooling necessary to make the wing. They often work together in a feedback loop in order to integrate manufacturing concerns into the wing design.

The job of the manufacturing engineering team is to ensure that a reasonably skilled workforce can put out a quality product with consistency. However, a good portion of what happens during manufacturing is determined and decided upon during the design phase. Design often has to take into account manufacturing by integrating such things as reference marks, tooling points, easily accessible test connections, failure indicators (such as LED's), etc.

Aside from some corner cases it is of great value to treat all product quality issues as failures to engineer the product for optimal quality. By doing so one can ensure that the issue has a formal process through which the problem can be studied, analyzed and eliminated from future production. Yes, sometimes failure is required in order to discover what needs to be addressed. This also extends to operational issues. If, for example, users keep hitting the wrong buttons on a device it could very well mean that they are too small and spaced too closely, thereby increasing the probability of a user pressing one button when she meant to press one of the adjoining buttons.

Worth reading:

http://www.amazon.com/Machine-That-Changed-World-Revolutioni...

http://www.amazon.com/Poka-Yoke-Improving-Product-Quality-Pr...

Going back to the subject of this thread, I'd ask you to understand that the idea of designing mechanical assemblies for proper mating, indexing and alignment is, perhaps, at the core of mechanical engineering. For example, there are formal approaches to designing something like two mating parts, both with an array of holes that must align. Manufacturing tolerances are taken into account in order to allow for maximal and minimal errors at all bolt positions and, therefore, produce and assembly that will be easy to put together. The holes on one assembly are made slightly larger than on the other in order to allow for tolerances. Bolts are never used for indexing. If indexing is important, pins are added in order to guarantee alignment. Ignoring these design maxims means that, during manufacturing, someone has to use a manual reaming tool to enlarge holes in order to allow for the parts to go together and, perhaps, a hammer to adjust alignment. This is a failure of design, not a failure of the assembly worker. Mechanical engineers are trained to understand these issues and, unless they are complete morons, should and do work these constraints into their designs.

When it comes to the idea of a mission critical sensor designed into an assembly to be integrated into a rocket by a technician, well, there really is no excuse. This is basic engineering. You consider manufacturing and operational failure modes and seek to eliminate or reduce the probability of running into them. I don't know what these Russian modules look like. I'll just say that in a lot of cases a simple $0.25 alignment pin or an almost trivial machined or sheet-metal feature can ensure that an assembly is never mounted upside-down. There's absolutely no excuse for an engineer to design a critical sensor assembly that can be mounted in any orientation other than that which is required for the proper and safe operation of the system.


It seemed like a pretty resilient system. Those parts were right side up by the time the flight ended.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: