My question is more of how many shallow neural networks are still in use. The de...

stochastic_monk · on Oct 29, 2017

You're absolutely right. But this is still a huge step forward. There has been work on the VC dimension of neural networks for a long time (and it's been shown to be finite), which is a necessary but not sufficient condition for efficient PAC learnability.

If it can be done for 3 layers, then maybe it can be done for more. And I happen to really like it when my problems have polynomial time guarantees.

[VC dimension is a quantitative measure related to the complexity that a model's parameters allow it to express. I think of it as analogous to entropy for physical systems.]

srean · on Oct 29, 2017

Apart from deep wagging contests this might become less relevant if we go by the notion "poly time == solved". The reason why this is so is that 3 layer Neural Networks are uniform approximators of smooth functions.

I have to read the paper carefully for assumptions made, but if the result holds true for the entire class of 3 layer NNs, or a class that is big enough not to sacrifice uniform approximation then this would be Big Deal.

Of course for practical applications, poly-time may not mean that it is a solved problem, or that the poly-time algorithm is the best algorithm to use on a typical instance. The exponent or the leading constant could be very high, and deeper networks may well offer more conducive properties.

rspeer · on Oct 29, 2017

I think this assessment of what's "typical" may just be based on posturing. For some reason you can't brag about training a relatively shallow neural network (is efficiency not valued in this field?).

My counter-assessment is that the space of problems you solve just by making your NN deeper is very small. Such problems clearly exist, but I've seen few compelling examples outside of image classification.