Oh cool! I've wanted something like notion for years. Ideally on top of CRDTs (so I own my own data). I really appreciate all the work your company is doing! Feel free to get in touch if you want to have a proper chat about this stuff.
> My biggest open question is how to design my centralized server side storage system for CRDT data. To service writes from old clients I need to retain a total history of a document, but I don’t want to force new clients to download the entire history nor do I want these big histories in my cache, so I end up wanting a hot/cold system; and building that kind of thing and dealing with the edge cases seems like more than 100 lines of code.
Yeah definitely more than 100 lines of code. I'm sad to report that in diamond types (my own CRDT) I've spent ~12000 lines of code in an attempt to solve some of these problems. I could probably get that down under 3000 loc in a rewrite if I'm happy to throw away some of my optimizations. Doing so would dramatically lower the size of the compiled wasm bundle too - though the wasm bundle is still comfortably under 100kb over the wire, so maybe its fine?
Regarding history, I have a lot of thoughts. The first is that with the right approach, historical data compresses really well. Martin Kleppman's automerge-perf data set has 260k edits ending in 100kb of text. The saving system I'm working on can store the entire editing history (enough to merge changes from any version) in this example with just 23kb of overhead on disk. I think that resulting data set might only need to be accessed in the case of concurrent changes, and then only back as far as the common ancestor. But I haven't implemented that optimization yet.
And yeah; I've been thinking a lot about what a CRDT-native database could look like too. There's way too many interesting and useful problems here to explore.
> My biggest open question is how to design my centralized server side storage system for CRDT data. To service writes from old clients I need to retain a total history of a document, but I don’t want to force new clients to download the entire history nor do I want these big histories in my cache, so I end up wanting a hot/cold system; and building that kind of thing and dealing with the edge cases seems like more than 100 lines of code.
Yeah definitely more than 100 lines of code. I'm sad to report that in diamond types (my own CRDT) I've spent ~12000 lines of code in an attempt to solve some of these problems. I could probably get that down under 3000 loc in a rewrite if I'm happy to throw away some of my optimizations. Doing so would dramatically lower the size of the compiled wasm bundle too - though the wasm bundle is still comfortably under 100kb over the wire, so maybe its fine?
Regarding history, I have a lot of thoughts. The first is that with the right approach, historical data compresses really well. Martin Kleppman's automerge-perf data set has 260k edits ending in 100kb of text. The saving system I'm working on can store the entire editing history (enough to merge changes from any version) in this example with just 23kb of overhead on disk. I think that resulting data set might only need to be accessed in the case of concurrent changes, and then only back as far as the common ancestor. But I haven't implemented that optimization yet.
And yeah; I've been thinking a lot about what a CRDT-native database could look like too. There's way too many interesting and useful problems here to explore.