I think my ignorance is showing here, but that paper's tldr to me seems to be: neural network performance is not a monotonic function of network width. Like the conclusion and the problem statement seem to be trivially equivalent.
They admit that this law is only useful if you already know where the 'breaks' are: "If an additional break of sufficient sharpness happens at a scale that is sufficiently larger than the
maximum (along the x-axis) of the points used for fitting, there does not (currently) exist a way to
extrapolate the scaling behavior after that additional break."
The scaling laws have broken: https://arxiv.org/abs/2210.14891