Refactoring is really rather well defined. It's "
just transformations that are invariant w.r.t. the outcome". The reason they are hard to automate is that 'invariant w.r.t. the outcome' is a lot more lenient than most semantic models van handle. But this kind of well-defined task with a slight amount of nuance (and decently checkable) seems pretty well-suited to an LLM.
At least for the linux kernel, qemu and other large c projects, this is a solved problem with coccinelle[1]. Compared to AI, it has the added benefit of not doing incorrect changes and/or hallucinating stuff or promt injections or ...
I guess you could use AI to help create a coccinelle semantic patch.
I'm genuinely confused about your point of view. Have you tried refactoring with GPT-4?
I have been refactoring code using gpt-4 for some months now and the limiting factor have been the context size.
GPT-4 turbo now have 128k context and I can provide it with larger portions of the code base for the refactors.
When we have millions of tokens of context, based on what I'm experiencing now, I can see that a refactoring like the one made in ffmpeg would be possible. Or not? What am I missing here?