> They're going to diverge if the underlying manner in which data gets memorized and encoded, such as with RNNs, like RWKV.
In the original paper (https://arxiv.org/abs/2405.07987) the authors also compared the representations of transformer-based LLMs to convolution-based image models. They found just as much alignment between them as when both models were transformers.
Very interesting - the human bias implicit to the structure of the data we collect might be critical, but I suspect there's probably a great number theory paper somewhere in there that validates the Platonic Representation idea.
How would you correct for something like "the subset of information humans perceive and find interesting" versus "the set of all information available about a thing that isn't noise" and determine what impact the selection of the subset has on the structure of things learned by AI architectures? You'd need to account for optimizers, architecture, training data, and so on, but the results from those papers are pretty compelling.
The word human (usually) refer to all members of the Homo genus. The name Homo sapiens means 'wise man'. Except for the clever use of thumbs our genus successfully managed to apply logic consistently in order to survive. Human reasoning produced tools, technology and science.
So, no, it doesn't say it in the name. We occasionally are dumb, or make dumb comments, but the expectations are high.
So it's understandable why dumb humans would expect to replace workers with AI --- which struggles with basic math and reasoning, has zero real world experience and can't tell fact from fiction?
I think people make comments on LLMs not being smart in reaction to the comments from the leaders of AI labs that LLMs are so smart they could/will lead to mass unemployment.
> A question that was interesting, but didn’t lead to a larger conclusion, was asking what actually happens when you ask a tool like ChatGPT a question. 45% think it looks up an exact answer in a database, and 21% think it follows a script of prewritten responses.
Finding variations in constrained haystack with measurable defined results is what machine learning has always been good at. Tracing most efficient Trackmania route is impressive and the resulting route might be original as in human would never come up with it. But is it actually novel in creative, critical way? Isn't it simply computational brute force? How big that force would have to be in physical or less constrained world?
It's interesting that the model generalizes to unseen participants. I was under the impression that everyone's brain patterns were different enough that the model would need to be retrained for new users.
Though, I suppose if the model had LLM-like context where it kept track of brain data and speech/typing from earlier in the conversation then it could perform in-context learning to adapt to the user.
Basically correct intuition: the model does much better when we give it, e.g., 30 secs of neural data in the leadup instead of e.g. 5 secs. My sense is also that it's learning in context, so people's neural patterns are quite different but there's a higher-level generator that lets the model learn in context (or probably multiple higher-level patterns, each of which the model can learn from in context).
We only got any generalization to new users after we had >500 individuals in the dataset, fwiw. There's some interesting MRI studies also finding a similar thing that when you have enough individuals in the dataset, you start seeing generalization.
> When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something
ChatGPT and Gemini (and maybe others) can already perform and cite web searches, and it vastly improves their performance. ChatGPT is particularly impressive at multi-step web research. I have also witnessed them saying "I can't find the information you want" instead of hallucinating.
It's not perfect yet, but it's definitely climbing human percentiles in terms of reliability.
I think a lot of LLM detractors are still thinking of 2023-era ChatGPT. If everyone tried the most recent pro-level models with all the bells and whistles then I think there would be a lot less disagreement.
In the original paper (https://arxiv.org/abs/2405.07987) the authors also compared the representations of transformer-based LLMs to convolution-based image models. They found just as much alignment between them as when both models were transformers.
reply