Hacker Newsnew | past | comments | ask | show | jobs | submit | samuelstros's commentslogin

Improving on "git not handling non-text files" is a semantic understanding aka parse step in between the file write.

Take a docx, write the file, parse it into entities e.g. paragraph, table, etc. and track changes on those entities instead of the binary blob. You can apply the same logic to files used in game development.

The hard part is making this fast enough. But I am working on this with lix [0].

[0] https://github.com/opral/lix


What's the plan for large files that can't be merged? Images, executable binaries, encrypted files, that sort of thing?

Simple left or right merge. One overwrites the other one.

The appeal or structured file formats like .docx, .json, etc. Images are unstructured and simple "do you want to keep the left or right image" is good enough.


That doesn't really address the game dev use case then. Artists and designers want to prevent conflicts, not just throw away half the work and redo it.

track the source of the asset and it works. take ui design. dont track the svg. track the design file itself

Ok well what if I draw the foreground and you add something to the background and now my changes visually block your changes? Even if the file is merged, our work is wasted and must be redone. P4 is often popular in industry because artists can lock files and inform others that work is being done in that area.

If you actually want to capture those customers it's a use case that needs to be addressed.


How do you get blob file writes fast?

I built lix [0] which stores AST’s instead of blobs.

Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.

But, the all software just writes bytes atm.

The binary -> parse -> diff is too slow.

The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.

[0] https://github.com/opral/lix


I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.

There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.

P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.


Good framing. Source code is already a serialization of an AST, we just forgot that and started treating it as text. The practical problem is adoption: every tool in the ecosystem reads bytes.


This is exactly a reason why weave stays on top of git instead of replacing storage. Parsing three file versions at merge time is fine (was about 5-67ms). Parsing on every read/write would be a different story. I know about Lix, but will check it out again.


Yep “stripe couldnt build it because they rely on http ingestion”. Lol.


Oh interesting. Checks sound similar to lix validation rules [1].

We were coming from a an application perspective where blocking the users intent is a no-go.

Do you have a link to a discussion where the JJ community is discussing checks?

[1] https://github.com/opral/lix/issues/239


It's basically what Emdash (https://www.emdash.sh/), Conductor (https://www.conductor.build/) & CO have been building but as first class product from OpenAI.

Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.



oh i didn't know that claude code has a desktop app already


And it uses worktrees.


It isn’t its own app, but it’s built in to their desktop, mobile and web apps.


I never heard of Emdash before and I am following on AI tools closely. It just shows you how much noise there is and how hard is to promote the apps. Emdash looks solid. I almost went to build something similar because I wasn't aware of it.


I am not sure if multi agent approach is what it is hyped up to be. As long we are working on parallel work streams with defined contracts (say an agreed upon API def that backend implements and frontend uses), I'd assume that running independent agent coding sessions is faster and in fact more desirable so that neither side bends the code to comply with under specified contracts.


Usually I find the hype is centered around creating software no one cares about. If you're creating a prototype for dozens of investors to demo - I seriously doubt you'd take the "mainstream" approach.


Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?


I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.


I don't think this is true. Try running `/skills` or `/context` in both and you'll see.


Hey, Conductor founder here. Conductor is built on Anthropic's Agents SDK, which exposes most (but not all) of Claude Code's features.

https://platform.claude.com/docs/en/agent-sdk/overview


Thanks for clarifying, just wanted to point out it's not 1 to 1 with CC. Happy user of Conductor here btw, great product!


yeah, I wanted a better terminal for operating many TUI agent's at once and none of these worked because they all want to own the agent.

I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.

0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20


looks like we both did haha: https://github.com/saadnvd1/aTerm


Emdash is inducing CC, Codex, etc. natively. Therefore users are getting the raw version of each agent.


until then, the https://CommanderAI.app tries to fill the gap on Mac


They have Claude Code web in research preview


It still doesn't support plan mode... I'm really confused why that's so hard to do


It can but the plugins are not developed for production readiness yet. I should clarify that.

The way to write a plugin:

Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.


Merge algebra is similar to git with a three way merge. Given that lix tracks individual changes, the three way merge is more fine grained.

In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"

The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.


Thanks for the feedback.

AI agents are the pull right now to why version control is needed outside of software engineering.

The mistake in the blog post is triggering comparisons to git, which leads to “why is this better/different than git?”.

If you have a custom binary file, you can write a plugin for it! :)

Lix runs on top of a SQL database because we initially built lix on top of git but needed:

- database semantics (transactions, acid, etc.)

- SQL to express history queries (diffing arbitrary file formats cant be solved with a simple diff() API)


Indeed, if lix were to target code version controlling, incompatibility with git is a “dead on arrival” situation.

But, Lix use case is not version controlling code.

It’s embedding version control in applications. Hence, the reason why lix runs within SQL databases. Apps have databases. Lix runs of top of them.

The benefit for the developer is a version control system within their database, and exposing version control to users.


> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

There is nuance between git line by line diffing and what lix does.

For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.

On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.


> On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

I don’t think that’s actually true?

How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?

I’ve worked with some tools that can diff images. Works great. Not a problem in need of solving.

In any case I’ll give benefit of the doubt that this project solves some real problem in a useful way. I’m not sure what it is.

My goals in a VCS for binary files seem to be very very very different than yours.


I think our goals indeed differ.

> How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?

If version control is embedded in an app, constantly.

Imagine a cell in a spreadsheet. An application wants to display a "blame" for a cell C43 i.e. how did the cell change over time?

The lix way is this SQL query

SELECT * from state_history WHERE file_id <the_spreadsheet> AND schema_key "excel_cell" AND entity_id C43;

Diffing on the fly is not possible. The information on what changed needs to be available without diffing. Otherwise, diffing an entire spreadsheet file for every commit on how cell C43 changed takes ages.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: