Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems clear to me that Claude 3.7 suffers from overfitting, probably due to Anthropic seeing that 3.5 was a smash hit in the LLM coding space and deciding their North star for 3.7 should be coding benchmarks (which, like all benchmarks, do not properly capture the process of real-world coding).

If it was actually good they would've named it 4.0, the fact that they went from 3.5 to 3.7 (weird jump) speaks volumes imo.



The numbering jump is because there was "Claude 3.5" and then "Claude 3.5 (new)" and they decided to retroactively stop the madness and rename the later to 3.6 (which is what everyone was calling it anyway).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: