I'm guessing "the edge" is doing inference work in the browser, etc. as opposed ...

I'm guessing "the edge" is doing inference work in the browser, etc. as opposed to somewhere in the backend of the web app.

Maybe your local machine can run, I don't know, a model to make suggestions as you're editing a Google Doc, which frees up the Big Machine in the Sky to do other things.

As this becomes more technically feasible, it reduces the effective cost of inference for a new service provider, since you, the client, are now running their code.

The Jevons paradox might kick in, causing more and more uses of LLMs for use cases that were too expensive before.