Yes, for single user multiturn kvcache reuse could help a lot. vLLM has support for this via Automatic Prefix Caching (APC) so you’d be able to take advantage of this w/ Strix Halo now. llama.cpp has had a “prompt-cache” option but when I last looked it was a bit weird (only works for non-interactive use, saves and loads cache to disk) so it might not help on the Mac side.
Wouldn't the time be negligible with interturn kv caching? Many inference providers already do this.