> e.g. part-time front-line customer service will prefix a statement with "uhhh....

kube-system · on June 26, 2024

That doesn't mean the 'uhhh...' is related to the certainty of the remainder of the response.

I literally just tested your prompt, with the question "is the sky blue?" and chatgpt prefixed the response with "uhhh..."

These models create the illusion of thought by statistically stringing words together, but they don't actually think or perform judgement of their own.

Edit: After digging into this for a few minutes, I challenge you to try prompting an LLM to judge the certainty of its own responses. The results I am getting are even worse than I thought it would be.

bpodgursky · on June 26, 2024

What model are you using? Here's 4o https://chatgpt.com/share/8815a841-d06b-4876-9d3e-7f5f4f1d7b....

Custom instructions: "If you aren't confident in your answer, prefix your response with "Uhhhhh". Otherwise answer the same as normal."

kube-system · on June 27, 2024

I was also using 4o

So... 4o is not confident that only humans qualify as dependents?

I think even a very junior front-line customer service rep should be able to answer that one confidently.

It seems that what the model is actually doing is prefixing "Uhhhh" when your question is leading in a way that doesn't match the data it has. The fact that the IRS requires dependents to humans should be answerable with an extremely high confidence, and that data is without a doubt in their dataset... but again, the model doesn't actually experience human confidence or uncertainty.

bpodgursky · on June 27, 2024

It's not confident because OA2143 is a fake form I made up.

kube-system · on June 27, 2024

Which is another thing that a front-line worker would easily be able to answer.

https://www.irs.gov/forms-pubs-search?search=OA2143

Ultimately, the tax question you asked it is something simple for a front-line worker to answer. So either one of two things must be true:

* either GPT-4o is so bad at answering tax questions that it cannot even answer easy ones confidently

* or GPT-4o is so bad at determining its own confidence level that it doesn't know when it is able to definitively answer even an easy question.

Either situation makes it bad for this task.

As I mentioned above, humans are good for answering questions even when they don't know the answer, because they're good at expressing their confidence to other humans. In this case, you'd want the support agent to answer definitively that animals do not qualify as dependents. One could certainly make their chat bot answer unconfidently randomly, or in response to strange questions, or all the time, but then the confidence signal isn't actually providing social value of communicating certainty.