is anyone doing online reviews of model performance ? (I know artificial analysi...

reckless · 2025-07-17T17:14:01 1752772441

The aggregate picture only tells you so much.

Sites like simonwillison.net/2025/jul/ and channels like https://www.youtube.com/@aiexplained-official also cover new model releases pretty quickly for some "out of the box thinking/reasoning" evaluations.

For me and my usage I can really only tell if I start using the new model for tasks I actually use them for.

My personal benchmark andrew.ginns.uk/merbench has full code and data on GitHub if you want a staring point!

Eupolemos · 2025-07-17T17:20:32 1752772832

Yeah, GosuCoder is interesting.

https://youtu.be/064VC2gFIGY?si=l0LVtUttVrbiBZ3K