Hacker Newsnew | past | comments | ask | show | jobs | submit | davidsainez's commentslogin

Ever heard of Debian or Linux?


Linux was named Freax by Linus, but other people didn't like it and started calling it Linux and it just stuck


And Git :)


Excited to put this through its paces. It seems most directly comparable to GPT-OSS-20B. Comparing their numbers on the Together API: Trinity Mini is slightly less expensive ($0.045/$0.15 v $0.05/$0.20) and seems to have better latency and throughput numbers.


Why would that undermine its integrity? AFAICT there are a selection of "open" US-based LLMs to choose from: Google's Gemma, Microsoft's Phi, Meta's LLAMA, and OpenAI's GPT-OSS. With Phi licensed under MIT and GPT-OSS under Apache 2.


Because at that point you don't know where the data came from. You could be training on foreign propaganda without realizing it.

Presumably they wouldn't be training on synthetic data produced by anything less than a open frontier model and those are almost exclusively Chinese


I find the existence of opennext convincing proof of lock-in: https://blog.logrocket.com/opennext-next-js-portability/

Personally, I don’t bother with nextjs at all.


I think the fact that OpenNext can exist speaks to the opposite

A Next.js project can be deployed to a Docker image very easily [1]. If you want to use a provider that has their own infrastructure setup, then yes you need to do some work (that OpenNext does for you). But that's true of practically any framework deploying to a host that does more than just serve the docker container.

[1]: https://nextjs.org/docs/app/getting-started/deploying


Not wanting to review and maintain code that someone didn't even bother to write themselves is childish?


This argument obviously makes no sense. Especially when one of the examples is a 7 character diff.

But it's fine to say "this PR makes no sense to me explain it better please" and close it.


Denying code not on it's merits but it's source is childish.


I think most people are in complete agreement.

What people don't like about LLM PRs is typically:

a. The person proposing the PR usually lacks adequate context and so it makes communication and feedback, which are essential, difficult if not impossible. They cannot even explain the reasoning behind the changes they are proposing, b. The volume/scale is often unreasonable for human reviewed to contend with. c. The PR may not be in response to an issue but just the realization of some "idea" the author or LLM had, making it even harder to contextualize. d. The cost asymmetry, generally speaking is highly unfavorable to the maintainers.

At the moment, it's just that LLM driven PRs have these qualities so frequently that people use LLM bans as a shorthand since writing out a lengthy policy redescrbiing the basic tenets of participation in software development is tedious and shouldn't be necessary, but here we are, in 2025 when everyone has seemingly decided to abandon those principles in favor of lazyily generating endless reams of pointless code just because they can.


But to determine its merit a maintainer must first donate their time and read through the PR.

LLMs reduce the effort to create a plausible PR down to virtually zero. Requiring a human to write the code is a good indicator that A. the PR has at least some technical merit and B. the human cares enough about the code to bother writing a PR in the first place.


It's absolutely possible to use an LLM to generate code, carefully review, iterate and test it and produce something that works and is maintainable.

The vast majority of of LLM generated code that gets submitted in PRs on public GitHub projects is not that - see the examples they gave.

Reviewing all of that code on its merits alone in order to dismiss it would take an inordinate amount of time and effort that would be much better spent improving the project. The alternative is a blanket LLM generated code ban, which is a lot less effort to enforce because it doesn't involve needing to read piles and piles of nonsense.


> Denying code not on it's merits but it's source is childish.

No, its pretty standard legal policy actually.


Brandolini's law

Usually I hate quoting "laws" but think about it. I do agree that it would be awesome if we scrutinize 10+k lines of code to bring big changes but its not really feasible is it?


> works flawlessly

> intermittent outages

Those seem like conflicting statements to me. Last outage was only 13 days ago: https://news.ycombinator.com/item?id=45915731.

Also, there have been increasing reports of open source maintainers dealing with LLM generated PRs: https://news.ycombinator.com/item?id=46039274. GitHub seems perfectly positioned to help manage that issue, but in all likelihood will do nothing about it: '"Either you have to embrace the Al, or you get out of your career," Dohmke wrote, citing one of the developers who GitHub interviewed.'

I used to help maintain a popular open source library and I do not envy what open source maintainers are now up against.


GitHub: 60% of the time, it works every time.


> GitHub seems perfectly positioned to help manage that issue, but in all likelihood will do nothing about it

I genuinely don't understand this position. Is this not what Github issues bots were made for? No matter where your repo is hosted, you take the onus of moderating it onto yourself.

Downtimes are an issue, it's why I jokingly mentioned it. Besides that I'm without gripe. Make Github a high-nines service and I'll keep using it until the wheels fall off.


AFAICT, kimi k2 was the first to apply this technique [1]. I wonder if Anthropic came up with it independently or if they trained a model in 5 months after seeing kimi’s performance.

1: https://www.decodingdiscontinuity.com/p/open-source-inflecti...


OpenAI has been doing this since at least O3 in January, Anthropic has been doing it since 4 in May.

And the July Kimi K2 release wasn't a thinking model, the model in that article was released less than 20 days ago.


There are well documented cases of performance degradation: https://www.anthropic.com/engineering/a-postmortem-of-three-....

The real issue is that there is no reliable system currently in place for the end user (other than being willing to burn the cash and run your own benchmarks regularly) to detect changes in performance.

It feels to me like a perfect storm. A combination of high cost of inference, extreme competition, and the statistical nature of LLMs make it very tempting for a provider to tune their infrastructure in order to squeeze more volume from their hardware. I don't mean to imply bad faith actors: things are moving at breakneck speed and people are trying anything that sticks. But the problem persists, people are building on systems that are in constant flux (for better or for worse).


> There are well documented cases of performance degradation: https://www.anthropic.com/engineering/a-postmortem-of-three-...

There was one well-documented case of performance degradation which arose from a stupid bug, not some secret cost cutting measure.


I never claimed that it was being done in secrecy. Here is another example: https://groq.com/blog/inside-the-lpu-deconstructing-groq-spe....

I have seen multiple people mention openrouter multiple times here on HN: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Again, I'm not claiming malicious intent. But model performance depends on a number of factors and the end-user just sees benchmarks for a specific configuration. For me to have a high degree of confidence in a provider I would need to see open and continuous benchmarking of the end-user API.


All those are completely irrelevant. Quantization is just a cost optimization.

People are claiming that Anthropic et all changes the quality of the model after the initial release, which is entirely different and the industry as a whole has denied. When a model is released under a certain version, the model doesn’t change.

The only people who believe this are in the vibe coding community, believing that there’s some kind of big conspiracy, but any time you mention “but benchmarks show the performance stays consistent” you’re told you’re licking corporate ass.


I might be misunderstanding your point, but quantization can have a dramatic impact on the quality of the model's output.

For example, in diffusion, there are some models where a Q8 quant dramatically changes what you can achieve compared to fp16. (I'm thinking of the Wan video models.) The point I'm trying to make is that it's a noticeable model change, and can be make-or-break.


Of course, no one is debating that. What’s being debated is whether this is done after a model’s initial release, eg Anthropic will secretly change the new Opus model to perform worse but be more cost efficient in a few weeks.


> some secret cost cutting measure

That’s not the point — it’s just a day in the life of ops to tweak your system to improve resource utilization and performance. Which can cause bugs you don’t expect in LLMs. it’s a lot easier to monitor performance in a deterministic system, but harder to see the true impact a change has to the LLM


Thanks for sharing. I hear people make extraordinary claims about LLMs (not saying that is what you are doing) but it's hard to evaluate exactly what they mean without seeing the results. I've been working on a similar project (a static analysis tool) and I've been using sonnet 4.5 to help me build it. On cursory review it produces acceptable results but closer inspection reveals obvious performance or architectural mistakes. In its current state, one-shotted llm code feels like wood filler: very useful in many cases but I would not trust it to be load bearing.


I'd agree with that, yeah. If this was anything more important, I'd give it much more guidance, lay down the core architectural primitives myself, take over the reins more in general, etc - but for what this is, it's perfect.


Access to virtually infinite cash had more to do with Android's success than the source being proprietary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: