Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The real issue is they tested on data in their training set.

Hm, no.

They trained on a part of their synthetic set and tested on another part of the set. Or at least that's what they said they did:

> from which 1,000 were held out as a benchmark test set.

Emphasis mine.



Yes, but due to it being derived from the same underlying source dataset, it is effectively evaluating on the training dataset, not an independent validation/ test dataset.

The difference is subtle but important. If we expect the model to truly outperform a general model, it should generalize to a completely independent set.


Thanks, rereading it makes it clear that you are correct.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: