If your concern is philosophical, and you are defining LLMs as not having a “why”, then of course they cannot write down “why” because it doesn’t exist. This is the philosophical discussion I am trying to avoid because I don’t think it’s fruitful.
If your concern is practical and you are worried that the “why” an LLM might produce is arbitrary, then my experience so far says this isn’t a problem. What I’m seeing LLMs record in commit messages and summaries of work is very much the concrete reasons they did things. I’ve yet to see a “why” that seemed like nonsense or arbitrary.
If you have engineers checking in overly complex blobs of code with no “why”, that’s a problem whether they use AI or not. AI tools do not replace engineers and I would not with in any code base where engineers were checking in vibe coded features without understanding them and vetting the results properly.
I don't care what text the LLM generates. If you wanna read robotext, knock yourself out. It's useless for what I'm talking about, which is "something is broken and I'm trying to figure out what"
In that context, I'm trying to do two things:
1. Fix the problem
2. Don't break anything else
If there's something weird in the code, I need to know if it's necessary. "Will I break something I don't know about if I change this" is something I can ask a person. Or a whole chain of people if I need to.
I can't ask the LLM, because "yes $BIG_CLIENT needs that behavior for stupid reasons" is not gonna be a part of its prompt or training data, and I need that information to fix it properly and not cause any regressions.
It may sound contrived but that sort of thing happens allllll the time.
> If there's something weird in the code, I need to know if it's necessary.
What does this have to do with LLMs?
I agree this sort of thing happens all the time. Today. With code written by humans. If you’re lucky you can go ask the human author, but in my experience if they didn’t bother to comment they usually can’t remember either. And very often the author has moved on anyway.
The fix for this is to write why this weird code is necessary in a comment or at least a commit message or PR summary. This is also the fix for LLM code. In the moment, when in the context for why this weird code was needed, record it.
You also should shame any engineer who checks in code they don’t understand, regardless of whether it came from an LLM or not. That’s just poor engineering and low standards.
Yeah. I know. The point is there is no Chesterson's Fence when it comes to LLMs. I can't even start from the assumption that this code is here for a reason.
And yes, of course people should understand the code. People should do a lot of things in theory. In practice, every codebase has bits that are duct taped together with a bunch of #FIXME comments lol. You deal with what you got.
The problem is that your starting point seems to be that LLMs can check in garbage to your code base with no human oversight.
If your engineering culture is such that an engineer could prompt an LLM to produce a bunch of code that contains a bunch of weird nonsense, and they can check that weird nonsense in with no comments and no will say “what the hell are you doing?”, then the LLM is not the problem. Your engineering culture is. There is no reason anyone should be checking in some obtuse code that solves BIG_CORP_PROBLEM without a comment to that effect, regardless of whether they used AI to generate the code or not.
Are you just arguing that LLM’s should not be allowed to check in code without human oversight? Because yeah, I one hundred percent agree and I think most people in favor of AI use for coding would also agree.
Yeah, and I'm explaining that the gap between theory and practice is greater in practice than it is in theory, and why LLMs make it worse.
It's easy to just say "just make the code better", but in reality I'm dealing with something that's an amalgam of the work of several hundred people, all the way back to the founders and whatever questionable choices they made lol.
The map is the territory here. Code is the result of our business processes and decisions and history.
You're treating this as a philosophical question like a LLM can't have actual reasons because it's not conscious. That's not the problem. No, the problem is mechanical. The processing path that would be needed to output actual reasons just doesn't exist.
LLMs only have one data path and that path basically computes what a human is most likely to write next. There's no way to make them not do this. If you ask it for a cake recipe it outputs what it thinks a human would say when asked for a fake recipe. If you ask it for a reason it called for 3 eggs, it outputs what it thinks a human would say when asked why they called for 3 eggs. It doesn't go backwards to the last checkpoint and do a variational analysis to see what factors actually caused it to write down 3 eggs. It just writes down some things that sound like reasons you'd use 3 eggs.
If you want to know the actual reasons it wrote 3 eggs, you can do that, but you need to write some special research software that metaphorically sticks the AI's brain full of electrodes. You can't do it by just asking the model because the model doesn't have access to that data.
Humans do the same thing by the way. We're terrible at knowing why we do things. Researchers stuck electrodes in our brains and discovered a signal that consistently appears about half a second before we're consciously aware we want to do something!
But this is exactly why it is philosophical. We’re having a discussion about why an LLM cannot really ever explain “why”. And then we turn around and say, but actually humans have the exact same problem. So it’s not an LLM problem at all. It’s a philosophical problem about whether it’s possible to identify a real “why”. In general it is not possible to distinguish between a “real why” and a post hoc rationalization so the distinction is meaningless for practical purposes.
It's absolutely not meaningless if you work on code that matters. It matters a lot.
I don't care about philosophical "knowing", I wanna make sure I'm not gonna cause an incident by ripping out or changing something or get paged because $BIG_CLIENT is furious that we broke their processes.
If your concern is practical and you are worried that the “why” an LLM might produce is arbitrary, then my experience so far says this isn’t a problem. What I’m seeing LLMs record in commit messages and summaries of work is very much the concrete reasons they did things. I’ve yet to see a “why” that seemed like nonsense or arbitrary.
If you have engineers checking in overly complex blobs of code with no “why”, that’s a problem whether they use AI or not. AI tools do not replace engineers and I would not with in any code base where engineers were checking in vibe coded features without understanding them and vetting the results properly.