I wish models which we can self-host at home would start catching up. Relying on...

vineyardmike · 2025-07-18T00:39:02 1752799142

We already have really strong models that run on a consumer GPU, and really strong frameworks and libraries to support them.

The issue is (1) the extra size supports extra knowledge/abilities for the model. (2) a lot of the open source models are trained in a way to not compete with the paid offerings, or lack the data set of useful models.

Specifically, it seems like the tool-use heavy “agentic” work is not being pushed to open models as aggressively as the big closed models. Presumably because that’s where the money is.

YmiYugy · 2025-07-18T00:42:15 1752799335

I think model providers would love to run their models on a single GPU. The latency and throughput of GPU interconnects is orders of magnitudes worse than accessing VRAM. Cutting out the latency would make the models much more efficient to run, they wouldn't have to pay for such expensive networking. If they got to run it on consumer GPUs even better. Consumer GPUs probably cost something like 5-10x less with regards to raw compute than data center ones. New coding optimized models for single GPUs drop all the time. But it's just a really hard problem to make them good and when the large models are still in the barely good enough phase (I wasn't using agents much before Sonnet 4) it's just not realistic to get something useful locally.

oceanplexian · 2025-07-18T12:01:47 1752840107

Deepseek R1 is better than any model released more than 6 months ago. You can plug it into open source equivalents of Claude Code like Goose. And it runs on a Mac Studio which you can buy at any consumer electronics store.

The people saying that big tech is going to Gatekeep their IDE or pull the plug on them don’t seem to realize that the ship has already sailed. Good LLMs are here permanently and never going away. You just might have to do slightly more work than whipping out a credit card and buying a subscription product.

ethan_smith · 2025-07-18T05:12:43 1752815563

Check out Llama 3.1 70B Instruct, which now runs on consumer hardware with 24GB VRAM using techniques like unsloth or llama.cpp's new Q4_K_M quantization - surprisingly competitive with Claude for many coding tasks.