Yeah, it's nuts to think that.

rocqua · on Dec 12, 2023

Refactoring is really rather well defined. It's " just transformations that are invariant w.r.t. the outcome". The reason they are hard to automate is that 'invariant w.r.t. the outcome' is a lot more lenient than most semantic models van handle. But this kind of well-defined task with a slight amount of nuance (and decently checkable) seems pretty well-suited to an LLM.

nolist_policy · on Dec 12, 2023

At least for the linux kernel, qemu and other large c projects, this is a solved problem with coccinelle[1]. Compared to AI, it has the added benefit of not doing incorrect changes and/or hallucinating stuff or promt injections or ...

I guess you could use AI to help create a coccinelle semantic patch.

[1] https://en.wikipedia.org/wiki/Coccinelle_(software)

dataangel · on Dec 12, 2023

The part coccinelle does is the part GPT is good at, the problem is neither of them actually reason about the code

motoboi · on Dec 12, 2023

I'm genuinely confused about your point of view. Have you tried refactoring with GPT-4?

I have been refactoring code using gpt-4 for some months now and the limiting factor have been the context size.

GPT-4 turbo now have 128k context and I can provide it with larger portions of the code base for the refactors.

When we have millions of tokens of context, based on what I'm experiencing now, I can see that a refactoring like the one made in ffmpeg would be possible. Or not? What am I missing here?