99% of studies claiming some out of distribution failure of an LLM uses a model already made irrelevant by SOTA. These kinds of studies, with long throughputs and review periods, are not the best format to make salient points given the speed at which the SOTA horizon progresses
I wonder what is baseline OOD generalization for humans. It takes around 7 years to generalize visual processing to X-ray images. How well does a number theorist respond to algebraic topology questions? How long it will take a human to learn to solve ARC challenges in the json format just as well as in the visual form?
99% of studies claiming some out of distribution failure of an LLM uses a model already made irrelevant by SOTA. These kinds of studies, with long throughputs and review periods, are not the best format to make salient points given the speed at which the SOTA horizon progresses