Improving on "git not handling non-text files" is a semantic understanding aka parse step in between the file write.
Take a docx, write the file, parse it into entities e.g. paragraph, table, etc. and track changes on those entities instead of the binary blob. You can apply the same logic to files used in game development.
The hard part is making this fast enough. But I am working on this with lix [0].
Simple left or right merge. One overwrites the other one.
The appeal or structured file formats like .docx, .json, etc. Images are unstructured and simple "do you want to keep the left or right image" is good enough.
That doesn't really address the game dev use case then. Artists and designers want to prevent conflicts, not just throw away half the work and redo it.
Ok well what if I draw the foreground and you add something to the background and now my changes visually block your changes? Even if the file is merged, our work is wasted and must be redone. P4 is often popular in industry because artists can lock files and inform others that work is being done in that area.
If you actually want to capture those customers it's a use case that needs to be addressed.
I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.
There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.
P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.
Good framing. Source code is already a serialization of an AST, we just forgot that and started treating it as text. The practical problem is adoption: every tool in the ecosystem reads bytes.
This is exactly a reason why weave stays on top of git instead of replacing storage. Parsing three file versions at merge time is fine (was about 5-67ms). Parsing on every read/write would be a different story. I know about Lix, but will check it out again.
I never heard of Emdash before and I am following on AI tools closely. It just shows you how much noise there is and how hard is to promote the apps. Emdash looks solid. I almost went to build something similar because I wasn't aware of it.
I am not sure if multi agent approach is what it is hyped up to be. As long we are working on parallel work streams with defined contracts (say an agreed upon API def that backend implements and frontend uses), I'd assume that running independent agent coding sessions is faster and in fact more desirable so that neither side bends the code to comply with under specified contracts.
Usually I find the hype is centered around creating software no one cares about. If you're creating a prototype for dozens of investors to demo - I seriously doubt you'd take the "mainstream" approach.
Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?
I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.
It can but the plugins are not developed for production readiness yet. I should clarify that.
The way to write a plugin:
Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.
Merge algebra is similar to git with a three way merge. Given that lix tracks individual changes, the three way merge is more fine grained.
In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"
The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.
> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.
There is nuance between git line by line diffing and what lix does.
For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.
On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.
> On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
I don’t think that’s actually true?
How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?
I’ve worked with some tools that can diff images. Works great. Not a problem in need of solving.
In any case I’ll give benefit of the doubt that this project solves some real problem in a useful way. I’m not sure what it is.
My goals in a VCS for binary files seem to be very very very different than yours.
> How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?
If version control is embedded in an app, constantly.
Imagine a cell in a spreadsheet. An application wants to display a "blame" for a cell C43 i.e. how did the cell change over time?
The lix way is this SQL query
SELECT * from state_history
WHERE file_id <the_spreadsheet>
AND schema_key "excel_cell"
AND entity_id C43;
Diffing on the fly is not possible. The information on what changed needs to be available without diffing. Otherwise, diffing an entire spreadsheet file for every commit on how cell C43 changed takes ages.
Take a docx, write the file, parse it into entities e.g. paragraph, table, etc. and track changes on those entities instead of the binary blob. You can apply the same logic to files used in game development.
The hard part is making this fast enough. But I am working on this with lix [0].
[0] https://github.com/opral/lix