Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ok but preview sucks, run it on o1 pro.

99% of studies claiming some out of distribution failure of an LLM uses a model already made irrelevant by SOTA. These kinds of studies, with long throughputs and review periods, are not the best format to make salient points given the speed at which the SOTA horizon progresses



I wonder what is baseline OOD generalization for humans. It takes around 7 years to generalize visual processing to X-ray images. How well does a number theorist respond to algebraic topology questions? How long it will take a human to learn to solve ARC challenges in the json format just as well as in the visual form?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: