Not a question necessarily about the technical side, but I'm interested in your ...

aphyr · on May 24, 2020

There's a few things at play here. One is talking only about the positive results from the previous Jepsen analysis, while not discussing the negative ones. Vendors often try to represent findings in the most positive light, but this was a particularly extreme case. Not discussing default behavior is a significant oversight, and it's especially important given ~80% of people run with default write concern, and 99% run with default read concern.

The middle part of the report talks about unexpected but (almost all) documented behavior around read and write concern for transactions. I don't want to conjecture too much about motivations here, but based on my professional experience with a few dozen databases, and surveys of colleagues, I termed it "surprising". The fact that there's explicit documentation for what I'd consider Counterintuitive API Design suggests that this is something MongoDB engineers considered, and possibly debated, internally.

The final part of the report talks about what I'm pretty sure are bugs. I'm strongly suspicious of the retry mechanism: it's possible that an idempotency token doesn't exist, isn't properly used, or that MongoDB's client or server layers are improperly interpreting an indeterminate failure as a determinate one. It seems possible that all 4 phenomena we observed stem from the retry mechanism, but as discussed in the report, it's not entirely clear that's the case.

danpalmer · on May 24, 2020

Thanks for the thoughts.

I get the impression that MongoDB may have hyped themselves into a corner in the early days with poorly made (or misleading) benchmarks. Perhaps they have customers with a lot of influence determining how they think about performance vs consistency.

Maybe this combined with patching, re-patching, re-patching again their replication logic/consistency algorithm means that they'll be stuck in this sort of position for a long time.

aphyr · on May 24, 2020

Possibly! You're right that path dependence played a role in safety issues: the problems we found in 3.4.0-rc3 were related to grafting the new v1 replication protocol onto a system which made assumptions about how v0 behaved. That said, I don't want to discount that MongoDB has made significant improvements over the years. Single-document linearizability was a long time in the works, and that's nothing to sneeze at!

http://jepsen.io/analyses/mongodb-3-4-0-rc3