Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Recommender system benchmarks are less reliable than they seem—this study shows that changing data splitting strategies can significantly alter model rankings. Evaluating 17 state-of-the-art models across different splits (Leave One Last Item, Temporal User Split, etc.), the authors find performance shifts due to data leakage and test set inconsistencies. With Kendall’s τ correlations between 0.52 and 0.76, the results suggest that supposed improvements in deep learning recommenders may be artifacts of evaluation rather than true gains. The paper urges standardizing splits, favoring global temporal splitting, and releasing public splits for reproducibility.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: