Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a good question. I think a better benchmark than the current options is “go make $X dollars this quarter.” Right now the models fail this miserably. Claude can’t even run a vending machine inside Anthropic HQ. So there is still some kind of strategic activity that comes naturally to humans that LLMs struggle with. I know the big conundrum is “scaling solves this in the next N years” but my bet is that N > ~20 in this case.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: