Data Laced with History: Causal Trees and Operational CRDTs (2018)

zamalek · on Feb 14, 2021

One idea me and my brother messed about with was "Merkel DAGs". The place we worked at had, what amounted to, a really fancy workflow/flowchart designer and the question was always how to make it collaborative (because that would sell like hotcakes on a slide deck, irrespective of whether people used it). Another concern was versioning, and Git is a Merkel DAG; great problems think alike.

We never finished it because of some truly difficult questions, and decided that failure was the most reasonable result. There were some truly awesome things though. For example, if two users independently connected the same two flowchart steps with the same two lines, those two changes would be understood as identical by the system (by nature/emergence). It was also very granular snapshot consistency, which is incredibly simple to reason about.

The other cool thing was that CAS (again, a Git concept) is absurdly cache friendly, and all forms of fancy stuff can be done there.

We ultimately failed it because we had messed up somewhere and wound up with tons of borderline cases (and the hope was that we wouldn't need them). I still think, given the absence of our mistakes, it could work and would be really elegant code.

hecturchi · on Feb 14, 2021

Not 100% the thing, but potentially related work in this area:

https://github.com/ipfs/go-ds-crdt

(See link to paper, and links to other projects in it, like OrbitDB).

gritzko · on Feb 14, 2021

RON progressed over these three years. RON docs now sit in a RON-based revision control system coupled with a RON based wiki http://doc.replicated.cc/^Wiki/ron.sm

porker · on Feb 14, 2021

Thanksfor the link, that's a really well written introductory page.

kortex · on Feb 14, 2021

This is brilliant. One of the best things I've read in a while.

I find it striking that the author's 7 rules (at about 3/4 of the scrollbar on desktop) read much like a manifest on functional programming. In particular:

> The operations are the data.

That strikes me as a sword to the Gordian Knot of distributed state. Data isn't some mutable black box you poke mutations into and prod and read. Data is the set of operations used to construct it. This feels to me like the principle underlying FP in general, event sourcing, reactor pattern, and of course these RDTs. Of course, this implies a chunk of Data is the causal history of the data (which in the physical world is strictly true).

C'est ne pas un byte[].

The fun part comes in all the different ways to optimize going from the territory (all the operations on a data blob) to the map (the "realized" data blob, or view into the system).

This also has flavors of quantum mechanics to me. Causality is anything you can get away with mutating before a meaningful observation occurs.

_glsb · on Feb 14, 2021

Amazingly well written! Thank you, archagon!

thom · on Feb 14, 2021

Let's say you have a causal tree representing an output sequence, and you are interested in running an arbitrary set of finite state machines over this sequence. What is an efficient approach to rewinding and replaying those state machines in the face of updates coming into the sequence? What if those state machines themselves emit (and might want to retract) operations on the underlying sequence? Does anyone know what I should be googling here?

namibj · on Feb 14, 2021

You might want to look at what DDflow [0] can do, in regards to handling emission/retraction.

If you figure out how this would work in the context of DDflow, let us know. This sounds very interesting.

[0]: https://github.com/TimelyDataflow/differential-dataflow

j0e1 · on Feb 14, 2021

Previously on HN: https://news.ycombinator.com/item?id=17221221

dgreensp · on Feb 14, 2021

This was new to me. Very interesting and useful, thanks!