Hmm, it seems that the author takes very clear (and sometimes cynical) positions on some controversial questions. For example, "They don't have the capacity to think through problems logically." is an hotly debated claim, and I think with the advent of reasoning models this has at least become something one should not state in entry level material, which would hopefully reflect common understanding rather than the authors personal opinion in an ongoing discussion.
There are more claims like this about what language models can't do "because they just predict the next token". This line of reasoning, while superficially plausible, holds a lot of assumptions that have been questioned. The heavy lifting here is done by the word "just" - if you can correctly predict the next token in every situation (including novel challenges), does that not require an excellent world model - somehow explicitly reflected in the weights? This is not a settled question but the last few years of LLM success have been completely on the side of those who think that token prediction is quite general.
The material also makes several comparisons to human intelligence, and while it is obvious that humans are different from language models we do not really understand the emergence of all the things that are claimed to be "impossible" for the machine to have in humans (consciousness, morality, etc), it just so happens we are all human so we all agree we have it. Furthermore, it is not clear to me that something can only be called 'intelligent' if it perfectly mimics humans in every way. This is maybe just human bias to our own experience and risks a "submarines can't swim" debate which is really about language.
Many of these philosophical objections have been questioned by people in the field and more importantly by the rapid progress of the models in tasks they were supposed to be incapable of performing according to philosophical objectors. The last few years, every time somebody claims models "can't do X" a new model is released and lo and behold, X is now easy and solved. (If you read a 6 month old paper of impossible benchmarks, expect 75% to be already solved). In fact, benchmark satuation is a problem now. In other words, the goalposts are having trouble keeping up, despite moving at high speed.
I don't think you are doing the general public any service by simply claiming that it is a lot of hype and marketing, these models are really advancing rapidly and nobody really knows where it will end. The philosophical objections seem to be rather weak and are in rapid retreat with every new model, on the other hand the argument in favor of further progress is just "we had progress so far by scaling, if we keep scaling surely we will have more progress" (induction). This is not a strong guarantee of further progress.
The claim that the labs are 'marketing geniuses" for realising language models as chat instead of autocomplete (which they "really" are according to the text - what does that mean?) also seems a bit silly given the obvious utility of the models is already much higher than 'autocomplete'. This seems to be another instance of the common bias that a model that "just" predicts the next token is not allowed to be as succesful as it clearly is in all kinds of tasks.
I don't think a lot of these opinions are particularly well founded and they probably should not be presented in entry level material as if they are facts.
Edit: just to add a positive note, I do think it is extremely useful to educate people on the reliability problem, which is surely going to lead to lots of problems in the wrong hands.
Many claims don't stand up to scrutiny, and some look suspiciously like training to the test.
The Apple study was clear about this. LLMs and their related modal models lack the ability to abstract information from noisy text inputs.
This is really obvious if you play with any of the art generators. For example - the understanding of basic prepositions just isn't there. You can't say "Put this thing behind/over/in front of this other thing" and get the result you want with any consistency.
If you create a composition you like and ask for it in a different colour, you get a different image.
There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.
Text has exactly the same problem, but it's less obvious because what the grammar is usually - not always - perfect and the output has been tuned to sound authoritative.
There is not enough information in text as a medium to handle more than a small subset of problems with any consistency.
> There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.
It has been observed in LLMs that the distance between embeddings for colors follows the same similarity patterns that humans experience - colors that appear similar to humans, like red and orange, are closer together in the embedding space than colors that appear very different, like red and blue.
While some argue these models 'just extract statistics,' if the end result matches how we use concepts, what's the difference?
Part of this is that the art generators tend to use CLIP, whjch is not a particularly good text model, often only being slightly better than a bag of words, which makes many interactions and relationships pretty difficult to represent. Some of the newer ones have better frontends which improve this situation, though.
I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch, and from a new random seed each time (and even if the seed is fixed, the initial stages of the generation, where things like the rough image composition form, tend to be quite chaotic and so sensitive to small changes in prompt). There are tools that can make far more controlled adjustments of an image, but they tend to be a bit less user-friendly.
> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch
It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.
That doesn’t mean it’s impossible to train an LLM on the concept of similarity.
I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.
> cynical () positions on some controversial questions.
I feel "cynical" is an inappropriate word here.
We may have to, for the same (ecumenical) reasons that thinkers like
Churchland, Hofstadter, Dennet, Penrose and company have all struggled
with, eventually accept the impossibility of proof of (existence or
non-existence) on any hypothesis of "machine mind". The pragmatic
response is, "does it offer utility for me?". And that's all that can
be said. Anyone's choice to accept or reject ideas of machine
intelligence will remain inviolably personal and beyond appeal to
"proof" or argument. I think that's something we'd better get used to
sooner rather than later, in order to avoid a whole lot or partisan
misery and wasted breath.
I think the way he sketches the the AI labs as "marketing geniuses" for not just releasing their models as auto-correct is a bit cynical, as well as implying in general that these labs are muddying the waters on purpose by not agreeing with <authors position> and by engaging in "hype" (believing in the technology).
Sorry, "inappropriate" might have been inappropriate :) What am I
trying to say here?....that we're soon gonna find ourselves in an
insoluble and exhausting debate around machine thinking and its value.
The choice unfortunately seems to correlate with the person's age. Younger generations will have no trouble treating LLMs as actually intelligent. Yet another example of "Science progresses one funeral at a time.”
Definitely a "citation needed" moment I think. Friday, I was with a
lot of 12 year olds all firmly of the opinion that it's a "way to get
intelligence/information" but it's not actually intelligent. (FWIW
in UK they say "for real life intelligent") I noted this distinction.
Or rather, I noted because that's what they're taught. So teachers,
naturally pass on the commonsense position that "it's still just a
computer". That means waiting for funerals will not settle the matter
either. That's not to say a significant sect of more credulous "AI
worshippers" will not emerge.
There was a paper a while back on AI usage at work among engineers and it was very strongly correlated to age. This is not surprising, technology adoption is always very dependent on age. (None of this tells you if the technology is a net good)
There are more claims like this about what language models can't do "because they just predict the next token". This line of reasoning, while superficially plausible, holds a lot of assumptions that have been questioned. The heavy lifting here is done by the word "just" - if you can correctly predict the next token in every situation (including novel challenges), does that not require an excellent world model - somehow explicitly reflected in the weights? This is not a settled question but the last few years of LLM success have been completely on the side of those who think that token prediction is quite general.
The material also makes several comparisons to human intelligence, and while it is obvious that humans are different from language models we do not really understand the emergence of all the things that are claimed to be "impossible" for the machine to have in humans (consciousness, morality, etc), it just so happens we are all human so we all agree we have it. Furthermore, it is not clear to me that something can only be called 'intelligent' if it perfectly mimics humans in every way. This is maybe just human bias to our own experience and risks a "submarines can't swim" debate which is really about language.
Many of these philosophical objections have been questioned by people in the field and more importantly by the rapid progress of the models in tasks they were supposed to be incapable of performing according to philosophical objectors. The last few years, every time somebody claims models "can't do X" a new model is released and lo and behold, X is now easy and solved. (If you read a 6 month old paper of impossible benchmarks, expect 75% to be already solved). In fact, benchmark satuation is a problem now. In other words, the goalposts are having trouble keeping up, despite moving at high speed.
I don't think you are doing the general public any service by simply claiming that it is a lot of hype and marketing, these models are really advancing rapidly and nobody really knows where it will end. The philosophical objections seem to be rather weak and are in rapid retreat with every new model, on the other hand the argument in favor of further progress is just "we had progress so far by scaling, if we keep scaling surely we will have more progress" (induction). This is not a strong guarantee of further progress.
The claim that the labs are 'marketing geniuses" for realising language models as chat instead of autocomplete (which they "really" are according to the text - what does that mean?) also seems a bit silly given the obvious utility of the models is already much higher than 'autocomplete'. This seems to be another instance of the common bias that a model that "just" predicts the next token is not allowed to be as succesful as it clearly is in all kinds of tasks.
I don't think a lot of these opinions are particularly well founded and they probably should not be presented in entry level material as if they are facts.
Edit: just to add a positive note, I do think it is extremely useful to educate people on the reliability problem, which is surely going to lead to lots of problems in the wrong hands.