Hacker Newsnew | past | comments | ask | show | jobs | submit | mikesun's favoriteslogin

I'm really out on most of the "async" stuff, after having used it. (Mostly in Node and Tornado)

Remember in the early 90s when Windows and Mac OS were "cooperatively" multitasked? Which is to say, you had to explicitly yield to allow other applications to run (or risk locking up the entire system). And then it was replaced with pre-emptive multitasking, which allowed the scheduler to figure out what process deserved CPU time, while allowing the programmer not to have to think about it. You could call a blocking IO function, and the OS would just go do something else while you waited.

All this "async" stuff seems like a return of cooperative multitasking, only worse. Not only do I have to explicitly yield, but now it's to some event loop that can't even properly use multiple cores, or keep a coherent stack trace. It's a nightmare to debug. It's theoretically fast... except if one request forgets to yield, it can clog up the entire thing. I guess you use multiple processes for that and a dispatcher, but at that point you've basically reinvented preemptive multitasking... badly.

Threads aren't perfect, but excluding STM and actor models they definitely suck the least.


Zig is one of the most interesting languages I've seen in a very long time, and possibly the first radical breakthrough in low-level programming design in decades. Maybe it will become a big success and maybe it will tank, but after having two visions for low-level programming -- that of C's "syntax-sugared Assembly", or that of C++'s "zero-cost abstractions" whose low-level, low-abstraction code appears high level on the page, once you get all the pieces to fit (Rust, while introducing the ingenious borrow checking, still firmly follows C++'s design philosophy and can be seen as a C++ that checks your work) -- Zig shows a third way by rethinking, from the ground up, how low-level programming could and should be done.

It manages to address all of C++'s biggest shortcomings, which, in my view, are 1. language complexity, 2. compilation time, 3. safety -- in this order -- and it does so in a language that is arguably simpler than C, can be fully learned in a day or two (although the full implications of the design might take longer to sink in), and also inspires a new approach to partial evaluation, replacing generics and value templates, concepts/traits/typeclasses, constexprs/procedural macros, macros (or, at least, the "good parts" of macros) and conditional compilation with a single, simple feature.


Great question! Joran from TigerBeetle here.

  "This means absolute trust in data read from disk or received from other nodes?"
TigerBeetle places zero trust in data read from the disk or network. In fact, we're a little more paranoid here than most.

For example, where most databases will have a network fault model, TigerBeetle also has a storage fault model (https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...).

This means that we fully expect the disk to be what we call “near-Byzantine”, i.e. to cause bitrot, or to misdirect or silently ignore read/write I/O, or to simply have faulty hardware or firmware.

Where Jepsen will break most databases with network fault injection, we test TigerBeetle with high levels of storage faults on the read/write path, probably beyond what most systems, or write ahead log designs, or even consensus protocols such as RAFT (cf. “Protocol-Aware Recovery for Consensus-Based Storage” and its analysis of LogCabin), can handle.

For example, most implementations of RAFT and Paxos can fail badly if your disk loses a prepare, because then the stable storage guarantees, that the proofs for these protocols assume, is undermined. Instead, TigerBeetle runs Viewstamped Replication, along with UW-Madison's CTRL protocol (Corruption-Tolerant Replication) and we test our consensus protocol's correctness in the face of unreliable stable storage, using deterministic simulation testing (ala FoundationDB).

Finally, in terms of network fault model, we do end-to-end cryptographic checksumming, because we don't trust TCP checksums with their limited guarantees.

So this is all at the physical storage and network layers.

  "Zero deserialization? That sounds rather scary."
At the wire protocol layer, we:

  * assume a non-Byzantine fault model (that consensus nodes are not malicious),
  * run with runtime bounds-checking (and checked arithmetic!) enabled as a fail-safe, plus
  * protocol-level checks to ignore invalid data, and
  * we only work with fixed-size structs.
At the application layer, we:

  * have a simple data model (account and transfer structs),
  * validate all fields for semantic errors so that we don't process bad data,
  * for example, here's how we validate transfers between accounts: https://github.com/tigerbeetledb/tigerbeetle/blob/d2bd4a6fc240aefe046251382102b9b4f5384b05/src/state_machine.zig#L867-L952.
No matter the deserialization format you use, you always need to validate user data.

In our experience, zero-deserialization using fixed-size structs the way we do in TigerBeetle, is simpler than variable length formats, which can be more complicated (imagine a JSON codec), if not more scary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: