Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe they are using much more synthetic data than others. That's the secret sauce besides their larger training set and model. Synthetic data makes the model more consistent.


For all of the novel interest in the tech and service side, I can't understand why so many overlook the value of the datasets. I've been promoting them as potential products to license, similar to textbook publishing, and a bit reminiscent of how Google used its search appliances. Automating methods of creation will be an immensely valuable service.

I've already introduced methods of instruct formatting within our org, because as the architectures change, what we learn from performing the work to organize and account for our internal knowledge will continuously pay off.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: