Hacker Newsnew | past | comments | ask | show | jobs | submit | rao-v's commentslogin

I tried crush when it first came out - the vibes were fun but it didn’t seem to be particularly good even vs aider. Is it better now?

Disclaimer: I work for Charm, so my opinion may be biased.

But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.

I would suggest you to give it a try.


Will do thanks - any standout features or clever things for me to look out for?

Riding a motorcycle or even a bicycle around Waymos feels surprisingly safer. You can reliably predict so many things about how it will behave and to an extent even its traffic calming effect on other cars.

Yea. Cycling around self-driving cars is obviously much safer and many more people will be encouraged to do it.

I ended up on this journey using Dockge. Inoffensive and you can stick your compose files in a directory and manage with git vs. Portainer’s attempt to hide them.

I generally agree but I value projects like this because there are smaller scale environments where many of these fallacies are perfectly fine working hypotheses. My home lab or a low volume, low 9s service etc.

I’d love to believe this is real, but I’m pretty sure you will lose performance on a “fair” mix of tasks, even after fine tuning. I know multiple teams have explored recurrent layers (great for limited VRAM) but I don’t think it’s ever been found to be optimal.

I’m genuinely surprised to see this not discussed more by the FOSS community. There are so many ways to blow past the GPL now:

1. File by file rewrite by AI (“change functions and vars a bit”)

2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity

The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.

I don’t think any current license has a plausible defense against this sort of attack.


I’ve never delved fully into IP law, but wouldn’t these be considered derivative works? They’re basically just reimplementing exactly the same functionality with slightly different names?

This would be different from the “API reimplementation” (see Google vs Oracle) because in that case, they’re not reusing implementation details, just the external contract.


Because copyrights do not protect ideas. Thankfully. We are free to express ideas, as long as we do so in our own words. How that principle is applied in actual law, and how that principle is a applied to software is ridiculously complicated. But that is the heart of the principle at play here. The law draws a line between ideas (which cannot be copyrighted), and particular expressions of those ideas (e.g. the original source code), which are protected. However, it is an almost fractally complicated line which, in many place, relies on concepts of "fairness", and, because our legal system uses a system of legal precedence, depends on interpretation of a huge body of prior legal decisions.

Not being a trained lawyer, or a Supreme Court justice, I cannot express a sensible position as to which side of the line this particular case falls. There are, however, enormously important legal precedents that pretty much all professional software developers use to guide their behaviour with respect to handling of copyrighted material (IBM vs. Ahmdall, and Google v. Oracle, particularly) that seem to suggest to us non-lawyers that this sort of reimplementation is legal. (Seek the advice of a real lawyer if it matters).


Taking a step back, it seems fairly clear that wherever you set the bar, it should be possible to automate a system that reads code, generates some sort of intermediate representation at the acceptable level of abstraction and then regenerates code that passes an extensive set of integration tests … every day.

At that point our current understanding of open source protections … fails?


Depends whether you sit on the MIT half of open source, or the GPL side of open source, I suppose.

there's usually a test for originality, and it involves asking things (from the jury) like, is it transformative enough?

so if someone tells the LLM to write it in WASM and also make it much faster and use it in a different commercial sector... then maybe

since 2023 the standard is much higher (arguably it was placed too low in 1993)


"change functions and bars a bit" isn't a rewrite. Anything where the LLM had access to the original code isn't a rewrite. This would just be a derivative work.

However most of the industry willfully violates the GPL without even trying such tricks anyway so there are certainly issues


The fact that you are drawing such absolute conclusions is indication enough that you are not qualified to speak on this.


#1 is already possible and always has been. I never heard of a case of anyone actually trying it. #2 is too nitpicky and unnecessarily costly for LLMs. It would be better to just ask it to generate a spec and tests based on the original, them create a separate implementation based on that. A person can do that today free and clear. If LLMs will be able to do this, we will just need to cope. Perhaps the future is in validating software instead of writing it.


(1) sounds like a derivative work, but (2) is an interesting AI-simulacrum of a clean room implementation IF the first LLM writes a specification and not a translation.


+1 I’ve always had the feeling that training from randomly initialized weights without seeding some substructure is unnecessarily slowing LLM training.

Similarly I’m always surprised that we don’t start by training a small set of layers, stack them and then continue.


Better-than-random initialization is underexplored, but there are some works in that direction.

One of the main issues is: we don't know how to generate useful computational structure for LLMs - or how to transfer existing structure neatly across architectural variations.

What you describe sounds more like a "progressive growing" approach, which isn't the same, but draws from some similar ideas.


Agree re: progressive growing

In terms of sub structure - in the old days of Core Wars randomly scattering bits of code that did things could pay off. I’m imagining similar things for LLMs - just set 10% of weights as specific known structures and watch to see which are retained / utilized by models and which get treated like random init


It’s interesting that you invest in mouse movements vs just targeting a click at X in Y milliseconds. CAD and video games are of course a great reason for this, but I wonder how much typical tool use can be modeled by just next click events.

I’d love to see this sort of thing paired with eye tracking and turned into a general purpose precog predictive tool for computer use … but you probably have many better use cases for your world model!


+1 this does seem to be a genuine attempt to actually build an interpretable model, so nice work!

Having said that, I worry that you run into Illusion of Conscious issues where the model changes attrition from “sandbagging” to “unctuous” when you control its response because the response is generated outside of the attribution modules (I don’t quite understand how cleanly everything flows through the concept modules and the residual). Either way this is a sophisticated problem to have. Would love to see if this can be trained to parity with modern 8B models.


Actually, the model is forcing the response to be generated inside the attribution modules.


We’re probably a year from self hostable video LLM models that can identify sexual content etc. with high sensitivity (but probably poor specificity)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: