Voice recognition has gotten a lot better. I'm almost impressed by my Amazon Echo. That said, for an arbitrary recording of say a conference presentation, you need to either use a human transcriber or expect to spend a LOT of time cleaning things up.
(A lot probably has to do with switching to more data-based approaches.)
Right now we're at the stage of the "seven blind men and the elephant". Over time, our eyes will open and things will start making sense.