I think there was a major jump in AI capabilities from Anthropic and OpenAI between the end of 2025 and the start of 2026 that made them far more reliable at programming correctly. I wonder what changed in the secret sauce.
I suspect the big jump came from the release of Claude Opus 4.5/4.6 and GPT-5.x-Codex between Nov ‘25 and Feb ‘26, which were trained with heavy reinforcement learning on long coding projects, rewarding only real success (like running code, using terminals, self-fixing bugs, and passing tests) while adding better memory for huge codebases and extra coding-specific training.
4.5 and 5.2. Transformative. I know dozens of CTOs who were piloting AI in the fall, took a day to do something real over Xmas and then came back to their orgs with a mandate to double down and experiment with software factories once they saw what the November drops enabled.
Nothing drastic I'd say. It's a continuous stream of small improvements just accumulating with each release, and someone just noticed a few releases away from a previous publicized-bad-capabilities release that there's major improvement between those points. So it looks like something major only due to the spacing between the capability surveys on the release timeline.