Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wish models which we can self-host at home would start catching up. Relying on hosted providers like this is a huge risk, as can be seen in this case.

I just worry that there’s little incentive for bit corporations to research optimising the “running queries for a single user in a consumer GPU” use case. I wonder if getting funding for such research is even viable at all.



We already have really strong models that run on a consumer GPU, and really strong frameworks and libraries to support them.

The issue is (1) the extra size supports extra knowledge/abilities for the model. (2) a lot of the open source models are trained in a way to not compete with the paid offerings, or lack the data set of useful models.

Specifically, it seems like the tool-use heavy “agentic” work is not being pushed to open models as aggressively as the big closed models. Presumably because that’s where the money is.


I think model providers would love to run their models on a single GPU. The latency and throughput of GPU interconnects is orders of magnitudes worse than accessing VRAM. Cutting out the latency would make the models much more efficient to run, they wouldn't have to pay for such expensive networking. If they got to run it on consumer GPUs even better. Consumer GPUs probably cost something like 5-10x less with regards to raw compute than data center ones. New coding optimized models for single GPUs drop all the time. But it's just a really hard problem to make them good and when the large models are still in the barely good enough phase (I wasn't using agents much before Sonnet 4) it's just not realistic to get something useful locally.


Deepseek R1 is better than any model released more than 6 months ago. You can plug it into open source equivalents of Claude Code like Goose. And it runs on a Mac Studio which you can buy at any consumer electronics store.

The people saying that big tech is going to Gatekeep their IDE or pull the plug on them don’t seem to realize that the ship has already sailed. Good LLMs are here permanently and never going away. You just might have to do slightly more work than whipping out a credit card and buying a subscription product.


Check out Llama 3.1 70B Instruct, which now runs on consumer hardware with 24GB VRAM using techniques like unsloth or llama.cpp's new Q4_K_M quantization - surprisingly competitive with Claude for many coding tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: