I kinda think thats the point. Not exactly a software problem, but an informatio...

Normal_gaussian · on July 14, 2017

5 machines. Even numbers of machines introduce a higher failure likelihood for no greater tolerance.

jfoutz · on July 14, 2017

You're thinking majority not Byzantine fault tolerance. 2/3 have to agree.

nickpsecurity · on July 14, 2017

Many of us in high-assurance systems have witnessed triple, modular redundancy fail. I started saying 3 out of 5 for that reason. May be same for other commenter. I also want where possible for the computers to be in different locations, using different hardware, and with different developers working against same API.

vitus · on July 14, 2017

Yes, but those aren't Byzantine failures. Byzantine failures present with incorrect values, not simply the absence of a value (as seen by a total hardware failure).

See the abstract in Lamport's original paper introducing the Byzantine generals problem [0].

We also see a similar issue in error correction -- an introductory undergrad course might teach this via Lagrange interpolation [1], where you need only n+k of the coefficients in the presence of erasure errors, but n+2k in the general case (where n is the size of the actual message, and k is the maximum number of errors to correct).

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

[1] https://inst.eecs.berkeley.edu/~cs70/fa14/notes/n8.pdf

nickpsecurity · on July 14, 2017

Partial hardware failures exist. The wrong values start going through the system. A bit flip is easiest example. NonStop has been countering both partial and total HW failures in their design a long time now.