And your response here is a perfect example of confidently jumping to conclusions on what someone's intent is... which is exactly what you're saying the LLM did to you.
I scoped my comment specifically around what a reasonable human answer would be if one were asked the particular question it was asked with the available information it had. That's all.
Btw I agree with your comment that it hallucinated/assumed your intent! Sorry I did not specify that. This was a bit of a 'play stupid games win stupid prizes' prompt by the OP. If one asks an imprecise question one should not expect a precise answer. The negative externality here is reader's takeaways are based on false precision. So is it the fault of the question asker, the readers, the tool, or some mix? The tool is the easiest to change, so probably deserves the most blame.
I think we'd both agree LLMs are notoriously overly-helpful and provide low confidence responses to things they should just not comment on. That to me is the underlying issue - at the very least they should respond like humans do not only in content but in confidence. It should have said it wasn't confident about its response to your post, and OP should have thus thrown its response out.
Rarely do we have perfect info, in regular communications we're always making assumptions which affect our confidence in our answers. The question is what's the confidence threshold we should use? This is the question to ask before the question of 'is it actually right?', which is also an important question to ask, but one I think they're a lot better at than the former.
Fwiw you can tell most LLMs to update its memory to always give you a confidence score 0.0-1.0. This helps tremendously, it's pretty darn accurate, it's something you can program thresholds around, and I think it should be built in to every LLM response.
The way I see it, LLMs have lots and lots of negative externalities that we shouldn't bring into this world (I'm particularly sensitive to the effects on creative industries), and I detest how they're being used so haphazardly, but they do have some uses we also shouldn't discount and figure out how to improve on. The question is where are we today in that process?
The framework I use to think about how LLMs are evolving is that of transitioning mediums. Like movies started as a copy/paste of stage plays before they settled into the medium and understand how to work along the grain of its strengths & weaknesses to create new conventions. Speech & text are now transitioning into LLMs. What is the grain we need to go along?
My best answer is the convention LLMs need to settle into is explicit confidence, and each question asked of them should first be a question of what the acceptable confidence threshold is for such a question. I think every question and domain will have different answers for that, and we should debate and discuss that alongside any particular answer.
I scoped my comment specifically around what a reasonable human answer would be if one were asked the particular question it was asked with the available information it had. That's all.
Btw I agree with your comment that it hallucinated/assumed your intent! Sorry I did not specify that. This was a bit of a 'play stupid games win stupid prizes' prompt by the OP. If one asks an imprecise question one should not expect a precise answer. The negative externality here is reader's takeaways are based on false precision. So is it the fault of the question asker, the readers, the tool, or some mix? The tool is the easiest to change, so probably deserves the most blame.
I think we'd both agree LLMs are notoriously overly-helpful and provide low confidence responses to things they should just not comment on. That to me is the underlying issue - at the very least they should respond like humans do not only in content but in confidence. It should have said it wasn't confident about its response to your post, and OP should have thus thrown its response out.
Rarely do we have perfect info, in regular communications we're always making assumptions which affect our confidence in our answers. The question is what's the confidence threshold we should use? This is the question to ask before the question of 'is it actually right?', which is also an important question to ask, but one I think they're a lot better at than the former.
Fwiw you can tell most LLMs to update its memory to always give you a confidence score 0.0-1.0. This helps tremendously, it's pretty darn accurate, it's something you can program thresholds around, and I think it should be built in to every LLM response.
The way I see it, LLMs have lots and lots of negative externalities that we shouldn't bring into this world (I'm particularly sensitive to the effects on creative industries), and I detest how they're being used so haphazardly, but they do have some uses we also shouldn't discount and figure out how to improve on. The question is where are we today in that process?
The framework I use to think about how LLMs are evolving is that of transitioning mediums. Like movies started as a copy/paste of stage plays before they settled into the medium and understand how to work along the grain of its strengths & weaknesses to create new conventions. Speech & text are now transitioning into LLMs. What is the grain we need to go along?
My best answer is the convention LLMs need to settle into is explicit confidence, and each question asked of them should first be a question of what the acceptable confidence threshold is for such a question. I think every question and domain will have different answers for that, and we should debate and discuss that alongside any particular answer.