> ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.
> I don't think people really appreciate how simple ARC-AGI-1 was, and what solving it really means.
> It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.
> Passing it means your system exhibits non-zero fluid intelligence -- you're finally looking at something that isn't pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.
Misunderstanding benchmarks seems to be the first step to claiming human level intelligence.
Additionally:
> > ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.
This feels like a generalized extension of the classic mis-reasoned response to 'A computer can now play chess.'
Common non-technical chain of thought after learning this: 'Previously, only humans could play chess. Now, computers can play chess. Therefore, computers can now do other things that previously only humans could do.'
The error is assuming that problems can only be solved via levels of human-style general intelligence.
Obviously, this is false from the way that computers calculate arithmetic, optimize via gradient descent, and innumerable other examples, but it does seem to be a common lay misunderstanding.
Probably why IBM abused it with their Watson marketing.
In reality, for reliable capabilities reasoning, the how matters very much.
> Misunderstanding benchmarks seems to be the first step to claiming human level intelligence.
It's known as "hallucination" a.k.a. "guessing or making stuff up", and is a major challenge for human intelligence. Attempts to eradicate it have met with limited success. Some say that human intelligence will never reach AGI because of it.
Thankfully nobody is trying to sell humans as a service in an attempt to replace the existing AIs in the workplace (yet).
I’m sure such a product would be met with ridicule considering how often humans hallucinate. Especially since, as we all know, the only use for humans is getting responses given some prompt.
Doesn’t that turn the entire premise on its head? If passing the benchmark means crossing the lower, not the upper threshold, that invalidates most claims derived from it.
> ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.
That's a misunderstanding of what ARC-AGI means. Here's what ARC-AGI creator François Chollet has to say: https://bsky.app/profile/fchollet.bsky.social/post/3les3izgd...
> I don't think people really appreciate how simple ARC-AGI-1 was, and what solving it really means.
> It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.
> Passing it means your system exhibits non-zero fluid intelligence -- you're finally looking at something that isn't pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.