Because the difference is that statistical models are by definition somewhat stochastic. Some incorrect answers are to be expected, even if you do everything right.
In software engineering you have test code. 100% of your tests should pass. If one doesn’t you can debug it until it does.
> How is this different from any other field of science or engineering?
The difference is that in most cases it is not so clear how well any given approach will work in a given scenario. Often the only option is to try, and if performance is not satisfying it is not easy to find a reason for it. Besides bugs or wrong model choice, it could be wrong training parameters, the quality or quantity of the data, and who knows how much more you would need.
It's not necessarily different from SWE, problem solving is a general skill, the difficulty comes from the fact that there is no clear definition of "it works" and that there are no guidelines or templates to follow to find out what is wrong, if anything at all. In particular, many issues are not about the code.
Obviously. How is this different from any other field of science or engineering?
you need to understand the theoretical motivation of things and check all operations one by one.
Again, this is true when debugging any complex system. How else would you debug it?
a bug there where we were normalizing on the wrong dimension of a tensor
If you describe the methodology you used to debug it, it will probably be applicable to debugging a complicated issue in any other SWE domain.