That's very ignorant to say that attention or convolution is anywhere close to t...

probably_wrong · on March 27, 2019

> That's very ignorant to say that attention or convolution is anywhere close to the expressivity of RNNs(LSTM).

But that's exactly the point of the Transformer model, with a paper aptly titled "Attention is all you need" [1]. And the Bert architecture, based in this idea, seems to be doing well. And they claim to be bery flexible, too[2].

Maybe that's what you meant with "unless you brutely search over hundreds of hyperparameters configs", but then again, isn't that what NNs are about anyway?

[1] https://arxiv.org/abs/1706.03762

[2] https://arxiv.org/abs/1810.04805

phowon · on March 27, 2019

The success of Transformers aside, I'm not sure you should be relying on model titles for anything, lest we forget papers like "One Model To Learn Them All" [1].

[1] https://arxiv.org/abs/1706.05137