maintenance cost on AI code isn't really uniform per line. most of it follows standard patterns, maybe easier to maintain than average human code. but the 5% where something went subtly wrong costs way more to fix because you can't retrace the reasoning, you just re-derive the whole thing from scratch. average looks fine but the tail kills you.
The "can't retrace the reasoning" problem is solvable with deterministic workflow constraints. Agents currently run with carte blanche, and they take a mile if given an inch. If the agent was in a specific phase with only specific tools available (and guarded against mega edits) the decision trace is there in the workflow definition. Moving to the next phase is a result of solving the current phase, with a reasoning. If you have no guardrails then there is no observability.