It really depends on what you're working on and what was included in the training data of the model you used. From a model architecture point of view, they're basically all the same, the main difference lies in the training data.
I agree here a fair bit, not that I'm an expert or anything. I'd like to see some progress on some of the neuronal modelling. It seems since 'attention is all you need' they've locked into this LLM stack and gluing up models as data pipelines rather than integrating different NN's on a deeper level.