And what is the protocol for the interface between the GPU-based LLM and the API? How does the LLM signal to make a tool call? What mechanism does it use?
Because MCP isn’t an API it’s the protocol that defines how the LLM even calls the API in the first place. Without it, all you've got is a chat interface.
A lot of people misunderstand what is the role of MCP. It’s the signaling the LLM uses to reach out of its context window and do things.