Please keep harping. The marketing myths that gets circulated about these models are creating very serious misunderstandings and misallocation of resources. I am hopeful that more cautious and careful dialogue like this will curb the notions of sentience or human intelligence that exciting headlines seemed to have put in the public discussion of these tools.
What’s the alternative? You can’t just say “don’t say that”. There needs to be something you can say instead, 5 syllables at the most, which evokes the same feeling of confident wrongness, without falling into anthropomorphism. It’s a tall order.
Confabulation is a term often brought forward as an alternative, but compared to hallucination almost noone knows what confabulation means. Metaphors like hallucinating might be anthropomorphizing, but they convey meaning well, so personally I look for other hills to die on.
Same with "it's not really AI", because no it's not, but language is fluid and that's alright.
it is perhaps wise to keep stronger characterizations, like "bullshit", for a soon to come future state where we need it as a descriptor to distinguish from "mere" hallucination.
Well, if you want to convey confident incorrectness - hallucination is definitely not the word, confabulate is far more like what is happening here. But, that's still anthropomorphizing. I'd prefer "incorrect response" or "bug."
Agree. Incorrect response, or faulty, or erroneous, and/or unsuitable.
We do not call it "hallucination" when a human provides unfounded, or dubious, or poorly-structured, or untrustworthy, or shallowly parroted, or patently wrong information.
We wouldn't have confidence in a colleague who "hallucinated" like this. What is the gain in having a system that generates rubbish for us?
You can say "Bullshit". LLMs bullshit all the time. Talk without regard to the truth or falsity of statements. It also doesn't pressupose that the trueness is known, nir deny it, so it should satisfy both camps; unlike hallucination which implies that truth and fiction are separate.
I wonder if there is some sort of transition between recalling declarative facts (some of which have been shown to be decodable from activations) on one hand and completing the sentence with the most fitting word on the other hand. The dream that "hallucination" can be eliminated requires that the two states be separable, yet it is not evident to me that these "facts" are at all accessible without a sentence to complete.
Technically, "bullshit" is the most accurate term. From "On Bullshit" by Professor Harry Frankfurt:
"What is wrong with a counterfeit is not what it is like, but how it was made. This points to a similar and fundamental aspect of the essential nature of bullshit: although it is produced without concern with the truth, it need not be false. The bullshitter is faking things. But this does not mean that he necessarily gets them wrong."
Both "hallucinations" and valuable output are produced by exactly the same process: bullshitting. LLMs do for bullshitting what computers do for arithmetic.
So the verb is "bullshitting" which does an even worse job of avoiding anthropomorphizing or attributing sentience to the model. At least "hallucinating" isn't done with conscious effort; "bullshitting" implies effort.
No, it ascribes accountability to the humans who employ a bullshitting machine to bullshit more effectively. It doesn't anthropomorphize anything, any more than "calculating" anthropomorphizes a computer doing arithmetic.
If you can ascribe accountability of "bullshitting" or "calculating" to the human who's using the machine then there's exactly no reason "thinking" or "writing" can't be ascribed to the human who's using the machine. There's no obvious line where the semantics of some words should or should not apply to a machine for behaviors that (up until recently) only applied to humans.
It just draws too many annoying comments and downvotes, and has been discussed ad nauseam on this forum and others - but I broadly agree. There are "features" with these applications where if I'm rude, or frustrated with the responses, the model will say things like "I'm not continuing this conversation."
How utterly absurd, it has no emotions, and there's no way that response was the result of a training set. It's just dumb marketing, all of it. And the real shame is (and the thing that actually pisses me off about the marketing/hype) that the useful things we actually have uncovered from ML or "AI" the last 10 years will be lost again in the inevitable AI winter we're facing following from whenever this market bubble collapses.
what you're referring to has nothing to do with how GPTs are pretrained or with hallucinations in and of themselves, and everything to do with how companies have reacted to the presence of hallucinations and general bad behavior, using a combination of fine tuning, RLHF, and keyword/phrase/pattern matching to "guide" the model and cut it off before it says something the company would regret (for a variety of reasons)
In other words, your complaints are ironically not about what the article is discussing, but about, for better or for worse, attempts to solve it.
I mean, in so many words that's precisely what I am complaining about. Their attempt to solve it is to make it appear more human. What's wrong with an error message? Or in this specific example - why bother at all? Why even stop the conversation? It's ridiculous.
RLHF is what was responsible for your frustration. You're assuming there is a scalable alternative. There is not.
> What's wrong with an error message?
You need a dataset for RLHF which provides an error message _only_ when appropriate. That is not yet possible. For the same reason the conversation stops.
> Or in this specific example - why bother at all? Why even stop the conversation? It's ridiculous.
They want a stop/refusal condition to prevent misuse. Adding one at all means sometimes stopping when the model should actually keep going. Not only is this subjective as hell, but there's still no method to cover every corner case (however objectively defined those may be).
You're correct to be frustrated with it, but it's not as though they have some other option that allows them to detect how and when to stop/not stop, error message/complain for every single human's preference patterns on the planet. Particularly not one that scales as well as RLHF on a custom dataset of manually written preferences. It's an area of active research for a reason.