You have far too much confidence in the idea that LLMs are anything like human brains. It’s next to meaningless to try to draw parallels between the two things.
Your assertion that “conscious reasoning” “has elements of a tree-search” is just completely made up. And the idea that human learning is at all similar to what LLM training is doing is completely divorced from reality.
But how do you reason? Because I definitely do brute force tree search in my brain to solve all sorts of problems.
E.g. let's imagine system design or some programming problem.
Based on my past experience or what I've read in general, my brain brings up potential solutions in theory. To me it's similar to embeddings search. It will try to pattern match the solutions. And the embeddings are in a tree or graph shape, where you narrow down constantly.
My brain then would start to evaluate the solutions in the order of likelihood that they fit the pattern according to my intuition.
Basically, I personally do see how an LLM that has certain chain of reasoning, algorithmically or otherwise built in, could represent my approach to problem solving. Because my problem solving can definitely be represented by a continuous flow of words.
I don't think current LLMs are exactly capable of that, because they would make too many mistakes somewhere and potentially get stuck, but I can't say they wouldn't be able to do that with more scale, when those mistakes get ironed out due to more ability to do nuanced things.
My theory: reasoning is the application of analogies. Here `analogy = memorization + pattern matching`. Pattern matching is just an associative memory query; remembering an example with fuzzy enough details to apply generally. Analogies are self-similar/recursive and sometimes transcend contexts -- they're useful for thinking, for thinking about thinking, etc. Analogies pervade our language, our thoughts, our approach to novel problems, and the cached solutions to trivial/practiced problems.
There is no qualitative distinction between System 1 & System 2 thinking. It's a spectrum, quantitative. The more analogy steps required, the more thinking done, and the more System 2-ish that thoughts feel. This (probably wrong) theory has many consequences:
1. LLMs are truly intelligent, albeit barely. They rely heavily on memorization rather than much analogy matching. Their depth of memorized knowledge can somewhat substitute/mislead us to misjudge that intelligence.
2. The validation set grokking phenomenon occurs when a new analogy is internalized.
3. Scale really is enough. The scaling laws will continue holding with astonishing accuracy.
4. MCTS heuristics are useful inductive biases to vastly lower compute at the expense of model flexibility. Eventually, The Bitter Lesson will come for MCTS too.
I have much more to say about this bonkers theory, but I fear already sounding like a ranting madman. Anyway, just my 2c.
LLMs have hard time to reason about adjacent topics.
I have a favorite question to a LLM that can be (and is) learned from papers from 2010-2012 (well before advent of LLMs) and I am keeping asking it for two years now.
LLMs are able to cite relevant papers with "word-by-word" accuracy, they remember them quite well. Every paper on the subject has all relevant definitions in them. Yet, LLMs cannot combine adjacent definitions to come up with the ultimate solution to my question.
The question: "implement blocked clause decomposition in Haskell."
Google "blocked clause decomposition" for papers on subject.
Have fun.
Over time, LLM's seem to lose the ability to even approach the solution to this question in a first answer. They need more and more attention and correction nowadays.
I see it as a knowledge collapse mentioned in paper I provided a link to. Instead of an answer I get a gamified pretense of a helping hand and we all do.
I don't think LLM's are like (complete) human brains, I think they vaguely resemble what we mean by language "intuition".
Brains need several other functions, too. Including search, some kind of motivator or initiator and probably several layers of coordination/orchestration.
None of which need to be strictly separated from the other layers.
Adding some kind of Q search on top of LLM's means they're not just LLM's anymore, but a composite model that has an LLM as one component.
Generally agree, but I would argue that while first L in LLM is what it is, the final LM is just what it does: a large model is still a "large language model" when it has other components besides a transformer involved in how it processes language, while a pure transformer model stops being a language model the moment it's trained on anything besides language — images (other than sign language), DNA sequences, financial data, etc.
Your assertion that “conscious reasoning” “has elements of a tree-search” is just completely made up. And the idea that human learning is at all similar to what LLM training is doing is completely divorced from reality.