I'm still honestly confused here. Yes, the success of neural networks generally shows they more than memorize. Some of the experiments here show they can just memorize too. I guess some describe something in-between.
But all this seems obvious. Is there something that's being quantified here?
You can measure the generalization gap: the difference between training and test performance. With good generalization that gap will be small. When fitting random labels the gap will be large.
Classical statistical learning theory (from Vapnik and Chervonenkis, among others), predicts that low capacity models will generalize (have small generalization gaps). It makes no promises for large capacity models, i.e. those which can fit random labels. Yet here we are.
I'm still honestly confused here. Yes, the success of neural networks generally shows they more than memorize. Some of the experiments here show they can just memorize too. I guess some describe something in-between.
But all this seems obvious. Is there something that's being quantified here?