Zero deserialization? That sounds rather scary. This means absolute trust in dat...

rom-antics · on Feb 27, 2023

What is the threat model you're worried about? If an attacker can write data to your disk or authenticate to your cluster, aren't you already screwed?

Yoric · on Feb 27, 2023

Yes, these are exactly my threats.

First, because I'm a strong believer in defense-in-depth. Secondly because both disk corruption and network packet corruption happen. Alarmingly often, in fact, if you're operating at large scale.

jorangreef · on Feb 27, 2023

Ours too!

For example, our deterministic simulation testing does storage fault corruption up to the theoretical limit of f according to our consensus protocol.

Details in our other reply to you.

jorangreef · on Feb 27, 2023

Great question! Joran from TigerBeetle here.

  "This means absolute trust in data read from disk or received from other nodes?"

TigerBeetle places zero trust in data read from the disk or network. In fact, we're a little more paranoid here than most.

For example, where most databases will have a network fault model, TigerBeetle also has a storage fault model (https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...).

This means that we fully expect the disk to be what we call “near-Byzantine”, i.e. to cause bitrot, or to misdirect or silently ignore read/write I/O, or to simply have faulty hardware or firmware.

Where Jepsen will break most databases with network fault injection, we test TigerBeetle with high levels of storage faults on the read/write path, probably beyond what most systems, or write ahead log designs, or even consensus protocols such as RAFT (cf. “Protocol-Aware Recovery for Consensus-Based Storage” and its analysis of LogCabin), can handle.

For example, most implementations of RAFT and Paxos can fail badly if your disk loses a prepare, because then the stable storage guarantees, that the proofs for these protocols assume, is undermined. Instead, TigerBeetle runs Viewstamped Replication, along with UW-Madison's CTRL protocol (Corruption-Tolerant Replication) and we test our consensus protocol's correctness in the face of unreliable stable storage, using deterministic simulation testing (ala FoundationDB).

Finally, in terms of network fault model, we do end-to-end cryptographic checksumming, because we don't trust TCP checksums with their limited guarantees.

So this is all at the physical storage and network layers.

  "Zero deserialization? That sounds rather scary."

At the wire protocol layer, we:

  * assume a non-Byzantine fault model (that consensus nodes are not malicious),
  * run with runtime bounds-checking (and checked arithmetic!) enabled as a fail-safe, plus
  * protocol-level checks to ignore invalid data, and
  * we only work with fixed-size structs.

At the application layer, we:

  * have a simple data model (account and transfer structs),
  * validate all fields for semantic errors so that we don't process bad data,
  * for example, here's how we validate transfers between accounts: https://github.com/tigerbeetledb/tigerbeetle/blob/d2bd4a6fc240aefe046251382102b9b4f5384b05/src/state_machine.zig#L867-L952.

No matter the deserialization format you use, you always need to validate user data.

In our experience, zero-deserialization using fixed-size structs the way we do in TigerBeetle, is simpler than variable length formats, which can be more complicated (imagine a JSON codec), if not more scary.

Yoric · on Feb 27, 2023

> Where Jepsen will break most databases with network fault injection, we test TigerBeetle with high levels of storage faults on the read/write path, probably beyond what most systems, or write ahead log designs, or even consensus protocols such as RAFT (cf. “Protocol-Aware Recovery for Consensus-Based Storage” and its analysis of LogCabin), can handle.

Oh, nice one. Whenever I speak with people who work on "high reliability" code, they seldom even use fuzz-testing or chaos-testing, which is... well, unsatisfying.

Also, what do you mean by "storage fault"? Is this simulating/injecting silent data corruption or simulating/injecting an error code when writing the data to disk?

> validate all fields for semantic errors so that we don't process bad data,

Ahah, so no deserialization doesn't mean no validation. Gotcha!

> In our experience, zero-deserialization using fixed-size structs the way we do in TigerBeetle, is simpler than variable length formats, which can be more complicated (imagine a JSON codec), if not more scary.

That makes sense, thanks. And yeah, JSON has lots of warts.

Not sure what you mean by variable length. Are you speaking of JSON-style "I have no idea how much data I'll need to read before I can start parsing it" or entropy coding-style "look ma, I'm somehow encoding 17 bits on 3.68 bits"?

jorangreef · on Feb 27, 2023

> Also, what do you mean by "storage fault"? Is this simulating/injecting silent data corruption or simulating/injecting an error code when writing the data to disk?

Exactly! We focus more on bitrot/misdirection in our simulation testing. We use Antithesis' simulation testing for the latter. We've also tried to design I/O syscall errors away where possible. For example, using O_DSYNC instead of fsync(), so that we can tie errors to I/Os.

> Ahah, so no deserialization doesn't mean no validation. Gotcha!

Well said—they're orthogonal.

> Not sure what you mean by variable length. Are you speaking of JSON-style "I have no idea how much data I'll need to read before I can start parsing it"

Yes, and also where this is internal to the data structure being read, e.g. both variable-length message bodies and variable-length fields.

There's also perhaps an interesting example of how variable-length message bodies can go wrong actually, that we give in the design decisions for our wire protocol, and why we have two checksums, one over the header, and another over the body (instead of one checksum over both!): https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...

Yoric · on Feb 27, 2023

Alright, I'm officially convinced that you've thought this out!

So, how's the experience of implementing this in Zig?

jorangreef · on Feb 27, 2023

Thanks! I hope so! [:raised_hands]

And we're always learning.

But Zig is the charm. TigerBeetle wouldn't be what it is without it. Comptime has been a gamechanger for us, and the shared philosophy around explicitness and memory efficiency has made everything easier. It's like working with the grain—the std lib is pleasant. I've learned so much also from the community.

My own personal experience has been that I think Andrew has made some truly stunning number of successively brilliant design decisions. I can't fault any. It's all the little things together—seeing this level of conceptual integrity in a language is such a joy.

ngrilly · on Feb 27, 2023

I can't stop being amazed by TigerBeetle's design and engineering.

jorangreef · on Feb 27, 2023

Thank you Nicolas!