You are right about the inference cost. The real economic value-adding tasks (for example replacing a whole team of developers with a technical product manager plus AI code generation) are already almost feasible (enough to see what the future will bring), but severely constrained by context limitations, processing speed and ease of integration into development workflows. Since both computing power and clever algorithms can be brought to bear on it, it will prove to be a situation featuring a Moore's law with a very short doubling time.
I spent Christmas playing around with the GPT4 API. I quickly discovered many ideas on how to use it. But many of my ideas depended on much lower inference cost and much higher context sizes.
I equate this problem to the cost of loading a website in 1996 vs 2024. My wild guess is that we are probably in 1997 right now.
Therefore, I concluded that not only would LLMs be ubiquitous in the future, I should invest in companies trying to solve the inference cost problem right now.