Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Greenblatt" shown with 42% in the bar chart is GPT-4o with a strategy: https://substack.com/@ryangreenblatt/p-145731248

So, how well might o1 do with Greenblatt's strategy?



I bet pretty well! Someone should try this. It's likely expensive but sampling could give you confidence to keep going. Ryan's approach costs about $10k to run the full 400 public eval set at current 4o prices -- which is the arbitrary limit we set for the public leaderboard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: