To borrow a concept of cloud server renting, there's also the factor of overselling. Most open source LLM operators probably oversell quite a bit - they don't scale up resources as fast as OpenAI/Anthropic when requests increase. I notice many openrouter providers are noticeably faster during off hours.
In other words, it's not just the model size, but also concurrent load and how many gpus do you turn on at any time. I bet the big players' cost is quite a bit higher than the numbers on openrouter, even for comparable model parameters.
This is a thing you can credibly say only if you've never used a popular new non-mainline editor. Sublime Code was "buggy as hell" too; all new editors are. Editors are incredibly difficult to do well. And Zed is doing it on hard mode, cross-platform.
I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.
Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.
In other words, it's not just the model size, but also concurrent load and how many gpus do you turn on at any time. I bet the big players' cost is quite a bit higher than the numbers on openrouter, even for comparable model parameters.
reply