I’ve been wondering something similar. I recently published a rust rope library ...

twotwotwo · on Feb 18, 2022

Wow, jumprope is really cool!

I'd had, but never gotten at all close to implementing, this idea of something like a a mutable RRB tree (think https://docs.rs/im/latest/im/struct.Vector.html but, again, mutable) with somewhat chunky (say, few-KB) gap buffers as the leaves. The tree gives you asymptotically not-terrible behavior with zillions of items, and the gap buffers could help practical performance when there's a lot of locality in your accesses.

I figure merging two lists of items or filtering some items out of a list might have the sort of locality of edits gap buffers handle well, and largish gap buffers could allow iteration to mostly spend time walking through contiguous chunks.

I still don't know how well something like that applies to a non-rope use case, but it's incredibly cool to see you actually did the work to build a hybrid structure out, validated it, and it does very well.

josephg · on Feb 18, 2022

Oh I hadn't heard of RRB trees! Very cool.

> might have the sort of locality of edits gap buffers handle well, and largish gap buffers could allow iteration to mostly spend time walking through contiguous chunks.

Yeah this was my insight too. The gap buffers allow the leaves to be larger (because there's fewer memcpys). And large leaves improve the cache behaviour. I was really surprised how well it works in practice - its crazy fast. Like, 30M keystrokes / second fast based on some real world editing traces. I can replay the edits from an 11 page academic paper in 5ms. Doing the same thing naively in javascript takes ~1s - which is 200x slower.

> I still don't know how well something like that applies to a non-rope use case,

My intuition agrees with you. I think the performance gain is mostly thanks to the high locality of edits in a rope. And you'd get that with some data sets in a tree structure, but not all.

But what if the data is semi-random? I'm imagining a BTreeMap-like data structure with gap buffers at the leaves. In this case, you would still get fast O(log n) performance of the btree / skip list. The gap buffer would help a lot less - but it would still halve the number of items memcpy'ed in the average case when inserting or deleting. Performance might still improve a little thanks to the gap buffers.

twotwotwo · on Feb 19, 2022

> I can replay the edits from an 11 page academic paper in 5ms.

That's super cool.

FWIW, what got me curious about RRB trees and so on was the idea of trying for a structure that's fast in some happy cases and degrades to...maybe slow but not, say, quadratic. The thread around https://news.ycombinator.com/item?id=24357976 and got me started and mentions some other fun stuff, including a "tier vector" with sqrt(N) insert/remove anywhere.

afranchuk · on Feb 18, 2022

Cool! I was surprised to see ropey as a dependency, I agree those utility functions should be inlined.

I'm currently using ropey to keep the server side state of documents in an LSP server. I recommend you add at least a slice type, as that's typically desirable. But looks great otherwise!

josephg · on Feb 18, 2022

Yeah those utility methods will be inlined anyway by the compiler. But copy+paste (with a comment for attribution) is still probably the right tool here.

I like the idea of adding a slice type. What use case do you have in mind for that?

afranchuk · on Feb 19, 2022

I don't know about other people, but my tokenizer is parameterized over string slice types. So I tokenize from a slice (be it &str or ropey's RopeSlice) and some of the returned tokens have subslices. This allows tokenization to do no heap allocation.

jaytaylor · on Feb 18, 2022

Please write an article contrasting the performance difference you achieve, that would be a super interesting read. Best wishes and good luck!