Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Aren't they using RLHF? The feedback from humans might not always be the ~right~ feedback. Couldn't that possibly degrade the quality of its responses?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: