Experiment No 8 — Responding to Voice Commands

jared314 · on March 10, 2013

After watching Starfire[0], I came to the conclusion that computer voice commands were another form of mystery meat navigation[1], with the added issue of interrupting the people around me. I have used it on my phone a few times, because the button surface area is small, but that's about it.

[0] http://asktog.com/starfire/

[1] http://en.wikipedia.org/wiki/Mystery_meat_navigation

kajecounterhack · on March 10, 2013

I first started experimenting with voice commands when the chrome speech input field came out, because Google's endpoint for taking audio clips became exposed. I wrote a python recorder that would start recording upon utterances, then stop and curl the result to Google, take the result, primitively do some NLP, route the request to Yelp/Google/Wikipedia and return a response to a web frontend.

What I learned? Even Google's voice transcriber, which is trained on an immense set of n-grams and such, is not very good. There's a lot of advanced signal processing + heuristics using ngrams to figure out what people are saying. Also, since it is very computationally demanding, Google hosts a web API instead of making it a local transcribing. So the internet itself becomes a bottleneck to timely transcribing. It's noticeably laggy.

We're not yet close to having computers capable of hearing with the same effectiveness as humans. Hopefully it happens eventually though -- that would unlock a world of interfaces. At the moment it's far too finicky to be practical.

If you're curious and want to try this yourself, Apple's endpoint for Siri voice commands is also available, someone's found it.

baq · on March 10, 2013

as a non-native speaker, i for the life of me can't get any of the commands right.

zobzu · on March 10, 2013

Make the page doctor.