After watching Starfire[0], I came to the conclusion that computer voice commands were another form of mystery meat navigation[1], with the added issue of interrupting the people around me. I have used it on my phone a few times, because the button surface area is small, but that's about it.
I first started experimenting with voice commands when the chrome speech input field came out, because Google's endpoint for taking audio clips became exposed. I wrote a python recorder that would start recording upon utterances, then stop and curl the result to Google, take the result, primitively do some NLP, route the request to Yelp/Google/Wikipedia and return a response to a web frontend.
What I learned? Even Google's voice transcriber, which is trained on an immense set of n-grams and such, is not very good. There's a lot of advanced signal processing + heuristics using ngrams to figure out what people are saying. Also, since it is very computationally demanding, Google hosts a web API instead of making it a local transcribing. So the internet itself becomes a bottleneck to timely transcribing. It's noticeably laggy.
We're not yet close to having computers capable of hearing with the same effectiveness as humans. Hopefully it happens eventually though -- that would unlock a world of interfaces. At the moment it's far too finicky to be practical.
If you're curious and want to try this yourself, Apple's endpoint for Siri voice commands is also available, someone's found it.
[0] http://asktog.com/starfire/
[1] http://en.wikipedia.org/wiki/Mystery_meat_navigation