> Its main purpose is to produce synthetic data for Orion, their next big LLM ht...

trashtester · on Aug 28, 2024

There is a caveat to this. Strawberry/Q* relies on elements similar to an element of AlphaZero to find "strategies" suitable for a problem. That takes it furter from "next-word-prediction" than current models, and improves quality of the output.

The downside is that this requires more compute during inference. That makes it too expensive to deploy directly.

Still, at least to some extent, this could allow a larger model to achieve similar performance to a Strawberry enhanced GPT-4o by adding more parameters, without the impact on speed and compute cost.

Humans often do the same. While we first learn some topic, we often first use conscious reasoning (which has elements of a tree-search) to find a way to solve it.

But if we practice enough times, it becomes "muscle memory".

JumpCrisscross · on Aug 28, 2024

> Strawberry/Q relies on elements similar to an element of AlphaZero to find "strategies" suitable for a problem. That takes it furter from "next-word-prediction" than current models, and improves quality of the output*

Genuine question: is this independently substantiated? Or Altmanspeak?

mjburgess · on Aug 28, 2024

The whole comment is "altmanspeak", which we might more directly call, "ideological descriptions of AI consistent with increasing stock prices"

trashtester · on Aug 28, 2024

The specifics are kept secret, but all the labs appear to work on some variant.

It seems to be related to the DeepMind reference in [1] and most of [2].

[1] https://en.wikipedia.org/wiki/Q-learning [2] https://arxiv.org/pdf/2403.09629

fire_lake · on Aug 28, 2024

This sounds like chess AI

trashtester · on Aug 28, 2024

More generally, it's part of what we call reasoning.

As opposed to do what first comes to mind, which would be similar to what regular LLM's have been doing.

skywhopper · on Aug 28, 2024

You have far too much confidence in the idea that LLMs are anything like human brains. It’s next to meaningless to try to draw parallels between the two things.

Your assertion that “conscious reasoning” “has elements of a tree-search” is just completely made up. And the idea that human learning is at all similar to what LLM training is doing is completely divorced from reality.

mewpmewp2 · on Aug 28, 2024

But how do you reason? Because I definitely do brute force tree search in my brain to solve all sorts of problems.

E.g. let's imagine system design or some programming problem.

Based on my past experience or what I've read in general, my brain brings up potential solutions in theory. To me it's similar to embeddings search. It will try to pattern match the solutions. And the embeddings are in a tree or graph shape, where you narrow down constantly.

My brain then would start to evaluate the solutions in the order of likelihood that they fit the pattern according to my intuition.

Basically, I personally do see how an LLM that has certain chain of reasoning, algorithmically or otherwise built in, could represent my approach to problem solving. Because my problem solving can definitely be represented by a continuous flow of words.

I don't think current LLMs are exactly capable of that, because they would make too many mistakes somewhere and potentially get stuck, but I can't say they wouldn't be able to do that with more scale, when those mistakes get ironed out due to more ability to do nuanced things.

a_wild_dandan · on Aug 28, 2024

> But how do you reason?

My theory: reasoning is the application of analogies. Here `analogy = memorization + pattern matching`. Pattern matching is just an associative memory query; remembering an example with fuzzy enough details to apply generally. Analogies are self-similar/recursive and sometimes transcend contexts -- they're useful for thinking, for thinking about thinking, etc. Analogies pervade our language, our thoughts, our approach to novel problems, and the cached solutions to trivial/practiced problems.

There is no qualitative distinction between System 1 & System 2 thinking. It's a spectrum, quantitative. The more analogy steps required, the more thinking done, and the more System 2-ish that thoughts feel. This (probably wrong) theory has many consequences:

1. LLMs are truly intelligent, albeit barely. They rely heavily on memorization rather than much analogy matching. Their depth of memorized knowledge can somewhat substitute/mislead us to misjudge that intelligence.

2. The validation set grokking phenomenon occurs when a new analogy is internalized.

3. Scale really is enough. The scaling laws will continue holding with astonishing accuracy.

4. MCTS heuristics are useful inductive biases to vastly lower compute at the expense of model flexibility. Eventually, The Bitter Lesson will come for MCTS too.

I have much more to say about this bonkers theory, but I fear already sounding like a ranting madman. Anyway, just my 2c.

netdevnet · on Sept 2, 2024

Analogy as the fire and fuel of cognition is basically Hofstadter's view on intelligence. He actually has a book with that title. Also check G.E.B.

thesz · on Aug 29, 2024

LLMs have hard time to reason about adjacent topics.

I have a favorite question to a LLM that can be (and is) learned from papers from 2010-2012 (well before advent of LLMs) and I am keeping asking it for two years now.

LLMs are able to cite relevant papers with "word-by-word" accuracy, they remember them quite well. Every paper on the subject has all relevant definitions in them. Yet, LLMs cannot combine adjacent definitions to come up with the ultimate solution to my question.

The question: "implement blocked clause decomposition in Haskell."

Google "blocked clause decomposition" for papers on subject.

Have fun.

Over time, LLM's seem to lose the ability to even approach the solution to this question in a first answer. They need more and more attention and correction nowadays.

I see it as a knowledge collapse mentioned in paper I provided a link to. Instead of an answer I get a gamified pretense of a helping hand and we all do.

a_wild_dandan · on Aug 28, 2024

We're unsure how minds or LLMs work. So let's also not dismiss potential parallels either. It's okay to not know stuff. We'll get there!

trashtester · on Aug 28, 2024

I don't think LLM's are like (complete) human brains, I think they vaguely resemble what we mean by language "intuition".

Brains need several other functions, too. Including search, some kind of motivator or initiator and probably several layers of coordination/orchestration.

None of which need to be strictly separated from the other layers.

Adding some kind of Q search on top of LLM's means they're not just LLM's anymore, but a composite model that has an LLM as one component.

ben_w · on Aug 29, 2024

Generally agree, but I would argue that while first L in LLM is what it is, the final LM is just what it does: a large model is still a "large language model" when it has other components besides a transformer involved in how it processes language, while a pure transformer model stops being a language model the moment it's trained on anything besides language — images (other than sign language), DNA sequences, financial data, etc.

thesz · on Aug 29, 2024

> Still, at least to some extent, this could allow a larger model to achieve similar performance to a Strawberry enhanced GPT-4o by adding more parameters, without the impact on speed and compute cost.

I see a contradiction here, do you?

audessuscest · on Aug 28, 2024

If it's proven to be a real issue, we might expect to see models trained on a lot of synthetic data with less knowledge but highly capable to reason, and other models less capable to reason but with large knowledge.

aucisson_masque · on Aug 28, 2024

yeah, i'm by no mean an llm engineer but even with the basic knowledge on how it works, i can understand how it's a bad idea to feed a LLM with the data from another LLM. Yeah, you are probably going to sanitize it and it will have less hallucinations but in the same time it's scope will be much more limited.

for instance, they speak about product marketing strategies. it requires creativity, which ai is not capable but currently it can still borrow human creativity. With llm data fed to another llm, it's going to be diluted even more and what will be left is extremely standard and common knowledge. Tho it could be interesting for corporation running chatbot, but even here there is always the tiny risk it hallucinates and screw the company big time, which is a deal breaker.

No i don't see the benefit.

trashtester · on Aug 28, 2024

It becomes less bad if the LLM is learning from something that is not a (pure) LLM, though.

Imagine if you let an LLM-like model learn to predict the next move from 1 billion AlphaZero self-play chess games.

The next-move prediction it ended up with might represent a much better chess player than games just trained on human online games.

And it might ALSO be faster than AlphaZero, meaning it could possibly even beat AlphaZero if time controls were restricted to 1 minute each, or something (for the whole game).

skywhopper · on Aug 28, 2024

Chess has such a tiny scope and tightly defined rules that it’s hilarious to map this idea of generated chess games as training input onto anything resembling real-world concepts.

Also, do you have any evidence whatsoever for your bald assertion that an LLM might get faster than and/or defeat AlphaZero? When has such a thing occurred in a simpler problem space?

trashtester · on Aug 28, 2024

A chess move is a single token. An network that would predict the single token without traversing a search space would most likely be faster than a model that needed to do some kind of recursive search.

This kind of benefit is what sets AlphaZero apart from earlier engines in the first place. It was better able to evaluate a position just from the pattern on the board, and needed to search a much smaller part of the search space than the older chess engines did.

An engine that at a glance would be quite good at predicting what move AlphaZero would do, given enough time, would take this principle to the next level.

You see the same in humans like Magnus Carlsen. Even if he doesn't "calculate" a single move (meaning traversing part of the search space in chess lingo), he can beat most good amateurs by just going for the move that looks "obvious" to him.

Anyway, the point isn't chess. The point is that the search part of the Alpha family of models (which tend to be specialized) seems to be making its way to multi-modal models.

And that this makes them slower and more expensive to run. That's fine for some applications, but since their output is not really just regurgitating the input, their output MAY be more useful as training data for other models than other available training data.

Now there is another difference between AlphaZero and traditional LLM's, and that is in the RL-through-self-play. I don't really know if something like Strawberry would also require RL-training to actually outperform the original training data.

JumpCrisscross · on Aug 28, 2024

Having one inscrutable AI train another should just thrill the safetyists.

xg15 · on Aug 28, 2024

Wasn't aware this has already become an -ism...

trashtester · on Aug 28, 2024

Aka "doomers".