Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://cdn.openai.com/o1-system-card-20240917.pdf

Check out the "CoT Deception Monitoring" section. In 0.38% of cases, o1's CoT shows that it knows it's providing incorrect information.

Going beyond hallucinations, models can actually be intentionally deceptive.



Please detail what you mean by "intentionally" here, because obviously, this is the ultimate alignment question...

...so after having a read through your reference, the money-shot:

Intentional hallucinations primarily happen when o1-preview is asked to provide references to articles, websites, books, or similar sources that it cannot easily verify without access to internet search, causing o1-preview to make up plausible examples instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: