yeah, i'm by no mean an llm engineer but even with the basic knowledge on how it works, i can understand how it's a bad idea to feed a LLM with the data from another LLM. Yeah, you are probably going to sanitize it and it will have less hallucinations but in the same time it's scope will be much more limited.
for instance, they speak about product marketing strategies. it requires creativity, which ai is not capable but currently it can still borrow human creativity. With llm data fed to another llm, it's going to be diluted even more and what will be left is extremely standard and common knowledge. Tho it could be interesting for corporation running chatbot, but even here there is always the tiny risk it hallucinates and screw the company big time, which is a deal breaker.
It becomes less bad if the LLM is learning from something that is not a (pure) LLM, though.
Imagine if you let an LLM-like model learn to predict the next move from 1 billion AlphaZero self-play chess games.
The next-move prediction it ended up with might represent a much better chess player than games just trained on human online games.
And it might ALSO be faster than AlphaZero, meaning it could possibly even beat AlphaZero if time controls were restricted to 1 minute each, or something (for the whole game).
Chess has such a tiny scope and tightly defined rules that itβs hilarious to map this idea of generated chess games as training input onto anything resembling real-world concepts.
Also, do you have any evidence whatsoever for your bald assertion that an LLM might get faster than and/or defeat AlphaZero? When has such a thing occurred in a simpler problem space?
A chess move is a single token. An network that would predict the single token without traversing a search space would most likely be faster than a model that needed to do some kind of recursive search.
This kind of benefit is what sets AlphaZero apart from earlier engines in the first place. It was better able to evaluate a position just from the pattern on the board, and needed to search a much smaller part of the search space than the older chess engines did.
An engine that at a glance would be quite good at predicting what move AlphaZero would do, given enough time, would take this principle to the next level.
You see the same in humans like Magnus Carlsen. Even if he doesn't "calculate" a single move (meaning traversing part of the search space in chess lingo), he can beat most good amateurs by just going for the move that looks "obvious" to him.
Anyway, the point isn't chess. The point is that the search part of the Alpha family of models (which tend to be specialized) seems to be making its way to multi-modal models.
And that this makes them slower and more expensive to run. That's fine for some applications, but since their output is not really just regurgitating the input, their output MAY be more useful as training data for other models than other available training data.
Now there is another difference between AlphaZero and traditional LLM's, and that is in the RL-through-self-play. I don't really know if something like Strawberry would also require RL-training to actually outperform the original training data.
for instance, they speak about product marketing strategies. it requires creativity, which ai is not capable but currently it can still borrow human creativity. With llm data fed to another llm, it's going to be diluted even more and what will be left is extremely standard and common knowledge. Tho it could be interesting for corporation running chatbot, but even here there is always the tiny risk it hallucinates and screw the company big time, which is a deal breaker.
No i don't see the benefit.