Yes. Not only easier, but more reliable. The examples you gave are perfectly static sound bits - they don't change. It doesn't make sense to transcribe them to text, just match the audio. Soundhound/Shazam/etc do this easily. I'm pretty sure YouTube has some kind of similar mechanism already in place.
This technology gets a lot more interesting if you want to search for people talking about you or your products.
This technology gets a lot more interesting if you want to search for people talking about you or your products.