> The value of what this emits is already handled by evaluating the diffs in the per-file history.
I mean, for the initial development/contribution/PR workflow, I agree with you: any code reviewer should be reading the diffs anyway, and if you're reading the diffs, these messages (being purely summaries derived from the code itself without the LLM having any info about developer intent) don't add anything.
But that's not the only time commit messages matter. A tool "fixing up" bad commit messages before they're pushed to a PR branch like this, might still help with later code maintenance after the code is merged:
• When you or someone else is looking at the commit lines after the fact, in e.g. `git log` to find commits to cherry-pick, such summaries would be a substitute for having to go commit-by-commit reading the diffs to find the one you're looking for. Or when doing e.g. a `git bisect`, they'd allow the likely-offender commit to "jump out" at you from the list of remaining commits, after just the first few bisect steps, without having to do 10 more iterations to narrow it down with actual rebuilds+test suite runs.
• and when someone else is looking at `git blame` while bug-hunting, or seeing the latest commit that touched each file when browsing a github repo tree, having these summaries would be the difference between having an opaque timeline of "fix" -> "fix 2" -> "fix again" -> "update" -> "fix" commits to try to keep distinct in one's head (may as well just try to recognize commits by the abbreviated git ref at that point), vs. having commits with descriptive mnemonic "names".
Note that this tool is supposed to be retroactive, not incremental. It rewrites messages for existing commits, that already had some other message when they were initially committed; it doesn't have any function that you could use to do `EDITOR=this-program git commit` and have it generate a commit's original commit message just-in-time.
As the author says in the README, this tool was created with the goal of fixing one's private git commit history before making it public. At the point when this tool would be run (i.e. at the point a developer is trying to "clean up" their private git history for publication), it's often already been long enough since you created these commits, that you likely don't actually remember what you were thinking at the time you created them. Any information about "what [you] were thinking when [you] committed" has already been lost. "The rice has been cooked", per se.
At that point, there's no value you could add by going back over the commits manually, beyond that which this program could add. In both cases, whether you or the LLM is doing it, the result would just be a reconstruction (i.e. a guess) at what the original developer was thinking/trying to accomplish at the time.
---
Which is not to say that this tool is a substitute for making good commits (instead of random-junk-drawer snapshot-your-work "WIP commits") in the first place. And, in fact, I have a feeling that this tool is not nearly as widely applicable (at least in its current form) as its author thinks it is... because "WIP commits" don't generally have any good way of summarizing them; because "WIP commits" are often not coherent single-purpose edits of the code.
A better version of this tool, I think, would be one that rewrites a private work branch / PR branch by first squashing it into a patch, and then breaks that patch back apart into a series of commits that each "do one thing", essentially introducing the change in a literate programming style where you're meant to read the commit-series top to bottom.
In other words, exactly what an experienced software engineer who knows they'll need to maintain their own code will already do themselves, at commit time, with `git add -p` (and before commit time, by having the discipline to avoid getting distracted solving a second problem while in the middle of solving the first!) And it's also exactly the format that patch-based workflows like LKML already expect contributors to construct [from whatever they were doing internally] when submitting a patch-series, to allow the patch to be read and considered on the mailing list as essentially a linear literate-programming explanation of the changes.
The tool still wouldn't be able to recover the intent of each patch series if it was never added; so the commits would still just be descriptions of the kind a later "software archaeologist" or reverse-engineer would give to the code. But it would at least be generating descriptions for coherent chunks of code, rather than attaching descriptions to commits that were essentially "whatever probably-partial progress on whatever incoherent set of things the programmer was trying to do at the time they needed to sync their work to switch from their laptop to their workstation."
I agree with everything you're saying, and I think other archaeologists feel the same.
My opinionated take is: I wouldn't want to use this space for the information that this tool could provide, and rather leave it as the truth.
The truth is that it was committed without a meaningful message, and now I might recognize a chain of message-less commits, representing a moment in time where the authors were trying to figure out where they wanted to end up.
If the tool is producing this info simply by reading the diffs in the code, why not just use it when you need it, to help explain what you're digging through, instead of changing the commit history?
Either way, the critical detail is: People should get that detail out before the rice has been cooked, and that's what I do for myself, in my own private repos, and when others do that for future archaeologists, we all benefit.
I mean, for the initial development/contribution/PR workflow, I agree with you: any code reviewer should be reading the diffs anyway, and if you're reading the diffs, these messages (being purely summaries derived from the code itself without the LLM having any info about developer intent) don't add anything.
But that's not the only time commit messages matter. A tool "fixing up" bad commit messages before they're pushed to a PR branch like this, might still help with later code maintenance after the code is merged:
• When you or someone else is looking at the commit lines after the fact, in e.g. `git log` to find commits to cherry-pick, such summaries would be a substitute for having to go commit-by-commit reading the diffs to find the one you're looking for. Or when doing e.g. a `git bisect`, they'd allow the likely-offender commit to "jump out" at you from the list of remaining commits, after just the first few bisect steps, without having to do 10 more iterations to narrow it down with actual rebuilds+test suite runs.
• and when someone else is looking at `git blame` while bug-hunting, or seeing the latest commit that touched each file when browsing a github repo tree, having these summaries would be the difference between having an opaque timeline of "fix" -> "fix 2" -> "fix again" -> "update" -> "fix" commits to try to keep distinct in one's head (may as well just try to recognize commits by the abbreviated git ref at that point), vs. having commits with descriptive mnemonic "names".
Note that this tool is supposed to be retroactive, not incremental. It rewrites messages for existing commits, that already had some other message when they were initially committed; it doesn't have any function that you could use to do `EDITOR=this-program git commit` and have it generate a commit's original commit message just-in-time.
As the author says in the README, this tool was created with the goal of fixing one's private git commit history before making it public. At the point when this tool would be run (i.e. at the point a developer is trying to "clean up" their private git history for publication), it's often already been long enough since you created these commits, that you likely don't actually remember what you were thinking at the time you created them. Any information about "what [you] were thinking when [you] committed" has already been lost. "The rice has been cooked", per se.
At that point, there's no value you could add by going back over the commits manually, beyond that which this program could add. In both cases, whether you or the LLM is doing it, the result would just be a reconstruction (i.e. a guess) at what the original developer was thinking/trying to accomplish at the time.
---
Which is not to say that this tool is a substitute for making good commits (instead of random-junk-drawer snapshot-your-work "WIP commits") in the first place. And, in fact, I have a feeling that this tool is not nearly as widely applicable (at least in its current form) as its author thinks it is... because "WIP commits" don't generally have any good way of summarizing them; because "WIP commits" are often not coherent single-purpose edits of the code.
A better version of this tool, I think, would be one that rewrites a private work branch / PR branch by first squashing it into a patch, and then breaks that patch back apart into a series of commits that each "do one thing", essentially introducing the change in a literate programming style where you're meant to read the commit-series top to bottom.
In other words, exactly what an experienced software engineer who knows they'll need to maintain their own code will already do themselves, at commit time, with `git add -p` (and before commit time, by having the discipline to avoid getting distracted solving a second problem while in the middle of solving the first!) And it's also exactly the format that patch-based workflows like LKML already expect contributors to construct [from whatever they were doing internally] when submitting a patch-series, to allow the patch to be read and considered on the mailing list as essentially a linear literate-programming explanation of the changes.
The tool still wouldn't be able to recover the intent of each patch series if it was never added; so the commits would still just be descriptions of the kind a later "software archaeologist" or reverse-engineer would give to the code. But it would at least be generating descriptions for coherent chunks of code, rather than attaching descriptions to commits that were essentially "whatever probably-partial progress on whatever incoherent set of things the programmer was trying to do at the time they needed to sync their work to switch from their laptop to their workstation."