Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

nobody serious (like OAI) was using the putnam problems to claim generalization. this is a refutation in search of a claim - and many people in the upstream thread are suggesting that OAI is doing something wrong by training on a benchmark.

OAI uses datasets like frontiermath or arc-agi that are actually held out to evaluate generalization.



I, actually, would disagree with this. To me ability to solve frontiermath does imply ability to solve putnam problems too. Only with putnam problems being easier - they are already been seen by the model, and they are also simpler problems. And just like this - putnam problem with simple changes are also one of the easier stops on the way to truly generalizing math models, with frontiermath being one of the last stops on the way there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: