More

smcleod · 2025-12-22T06:39:31 1766385571

This was an interesting read, thanks for sharing. I've recently been building something that uses Parakeet v2/v3 models, I'm using the parakeet-rs package (https://github.com/altunenes/parakeet-rs) which has had a few issues running models with CoreML (unrelated to the linked post), e.g. https://github.com/microsoft/onnxruntime/issues/26355

Two_hands · 2025-12-22T09:07:09 1766394429

Thank you for reading.

Also generally I think CoreML isn't the best. The best solution for ORT would probably be to introduce a pure MPS provider (https://github.com/microsoft/onnxruntime/issues/21271), but given they've already bought into CoreML the effort may not be worth the reward for the core team. Which fair enough as it's a pretty mammoth task

pzo · 2025-12-22T09:48:18 1766396898

However one benefits of CoreML - it is the only way to be able for 3rd party to execute on ANE (Apple Neural Engine aka NPU). ANE for some models can execute even faster than GPU/MPS and consume even less battery.

But I agree CoreML in ONNX Runtime is not perfect - most of the time when I tested some models there were too many partitioning and whole graph was running slower compare when using only model in just CoreML format.

Two_hands · 2025-12-22T10:05:50 1766397950

To be honest it's a shame the whole thing is closed up, I guess it's to be expected from Apple, but I reckon CoreML would be benefit a lot from at least exposing the internals/allowing users to define new ops.

Also, the ANE only allows some operators to be ran on it right? There's very little transparency/control on what can be offloaded to it and cannot which makes using it difficult.

smcleod · 2025-12-21T22:52:38 1766357558

My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I can run models locally that are arguably "better" than what was considered SOTA about 1.5 years ago. This is of course not an exact comparison but it's close enough to give some perspective.

menaerus · 2025-12-22T13:04:43 1766408683

OpenAI released GPT-4o in May 2024, and Anthropic released Claude 3.5 Sonnet in June 2024.

I haven't tried the local models as much but I'd find it difficult to believe that they would outperform the 2024 models from OpenAI or Anthropic.

The only major algorithmic shift was done towards the RLVR and I believe it was already being applied during the 2023-2024.

Aurornis · 2025-12-22T16:11:22 1766419882

I don't know about that. Even trying Devstral 2 locally feels less competent than the SOTA models from mid-2024.

It's impressive to see what I can run locally, but they're just not at the level of anything from the GPT-4 era in my experience.

smcleod · 2025-12-21T22:50:09 1766357409

I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.

calgoo · 2025-12-22T10:09:55 1766398195

Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.

Lapel2742 · 2025-12-22T16:14:26 1766420066

>Only issue I have found with llama.cpp is trying to get it working with my amd GPU.

I had no problems with ROCm 6.x but couldn't get it to run with ROCm 7.x. I switched to Vulkan and the performance seems ok for my use cases

parthsareen · 2025-12-22T05:09:30 1766380170

Desktop app is open-source now.

smcleod · 2025-12-21T22:46:40 1766357200

On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes.

simonw · 2025-12-21T23:08:19 1766358499

With Codex it only happened to me once in my 4.5hr session here: https://simonwillison.net/2025/Dec/15/porting-justhtml/

Claude Code is a whole lot less generous though.

stuaxo · 2025-12-22T10:11:36 1766398296

This is useful info.

I havent tried agentic coding as I havent set it up in a container yet, and not going to yolo my system (doing stuff via chat and a utility to copy and paste directories and files got me pretty far over the last year and a half).

alostpuppy · 2025-12-22T02:01:41 1766368901

For sure. On one project I kept using codex just to see where the wall was. Took a long time.

deaux · 2025-12-22T02:25:14 1766370314

It helps that Codex is so much slower than Anthropic models, a 4.5 hours Codex session might as well be a 2 hour Claude Code one. I use both extensively FWIW.

andix · 2025-12-21T22:55:37 1766357737

It really depends. When building a lot of new features it happens quite fast. With some attention to context length I was often able to go for over an hour on the 20$ claude plan.

If you're doing mostly smaller changes, you can go all day with the 20$ Claude plan without hitting the limits. Especially if you need to thoroughly review the AI changes for correctness, instead of relying on automated tests.

allenu · 2025-12-21T23:31:16 1766359876

I find that I use it on isolated changes where Claude doesn’t really need to access a ton of files to figure out what to do and I can easily use it without hitting limits. The only time I hit the 4-5 hour limit is when I’m going nuts on a prototype idea and vibe coding absolutely everything, and usually when I hit the limit, I’m pretty mentally spent anyway so I use it as a sign to go do something else. I suppose everyone has different styles and different codebases, but for me I can pretty easily stay under the limit without that it’s hard to justify $100 or $200 a month.

smcleod · 2025-12-15T01:22:52 1765761772

Indeed! I checked their status page within 2 minutes of having issues and it was updated to show they had detected it.

smcleod · 2025-12-13T00:42:13 1765586533

No I don't believe so. Cursor is usually pretty behind other agentic coding tools in my experience.

smcleod · 2025-12-12T19:08:04 1765566484

They probably used Amazon Q to generate them...

smcleod · 2025-12-11T19:12:37 1765480357

The DGX Spark is not good for inference though it's very bandwidth limited - around the same as a lower end MacBook Pro. You're much better off with a Apple silicon for performance and memory size at the moment but I'd recommend holding off until the M5 Max comes out early in the early as the M5 has vastly superior performance to any other Apple silicon chip thanks to its matmul instruction set.

llbbdd · 2025-12-11T19:28:11 1765481291

Oof, I was already considering an upgrade from the M1 but was hoping I couldn't be convinced to go for the top of the line. Is the performance jump from the M# -> M# Max chips that substantial?

smcleod · 2025-12-12T08:01:33 1765526493

The main jump is from anything to M5; not because it's simply the latest but because it has matmul instructions similar to a CUDA GPU which fixes the slow prompt processing on all previous generation Apple Silicon chips.

llbbdd · 2025-12-15T04:17:30 1765772250

I'm crying, man. I really don't want to set up a new laptop but you're making it hard.

smcleod · 2025-12-15T08:17:22 1765786642

The m5 max should be out around feb - wait till then

baby_souffle · 2025-12-12T04:46:45 1765514805

> Is the performance jump from the M# -> M# Max chips that substantial

From m1? Yes, absolutely. M3 is marginal now but m5 will probably make it definite.

smcleod · 2025-12-11T07:51:04 1765439464

That's quite fun, I wish it had more information about which model took that path and the inference / sampling parameters.

rvranjan · 2025-12-11T09:03:08 1765443788

Thanks! Quick overview: Paths are deterministic, not LLM-generated. I use OpenAI text-embedding-3-large to build a word graph with K-nearest neighbors, then BFS finds the shortest path. No sampling involved. The explanations shown in-game are generated afterward by GPT-5 to explain the semantic jumps. Planning to write up the full architecture in a blog post - will share here when it's ready.

smcleod · 2025-12-11T11:38:00 1765453080

Oh that makes a lot of sense, I'm glad it works that way actually - the explanations afterwards left me wondering if it was truly explaining the connections or if it was inferring what they would be (leading to a problem a bit like how "thinking" doesn't actually show the real connections to get to an answer) I'm glad it's not doing that. Neat game and learning opportunity. (Sorry for not wording that very well - long day!)

smcleod · 2025-12-09T21:55:46 1765317346

I see it requires an ANTHROPIC_API_KEY, does that mean it does not work with Claude Max plans?

tonyystef · 2025-12-10T12:32:37 1765369957

The API key powers Grov's features (Haiku for reasoning extraction + drift detection). It does work with claude max plans, for example I use it with my claude code instances, and I am a max user, but you just have to use an API key for the fundamental features of Grov.

If this is a deal-breaker for you, in the near future I'll let teams use our API key, so you can just install it and run it normally without having to set anything up other than connect to your team. If you have any other questions you can find my email in the repo.

Hope this helps.