Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Why do people seem to gloss over the fact that we can implement these technologies without losing privacy? e.g. voice recognition has been possible on home computers for decades now. You don't need the cloud for it.

The experience is not quite the same. We have voice recognition since at least late 90's but you have to spend long hours training (> 20 hrs) in order to have a decent result (not even comparable). The fact that is cloud based now enable the software to fit better to different accents and pronounciations.

Another thing is that personalization is not really possible in today devices if you want more than 3hrs battery.



I had better results in late 2000s with less than an hour of training MS Speech API than I have with Google Now today. Either off-line speech recognition isn't that bad or my English really sucks.


I don't know about Google Now. But Cortana does have way better results than local MS Speech API. My english does suck, so it is very impressive to me that Cortana got me right most of the time.


Maybe I'm not using it right, but Cortana on my Lumia 640 Windows Phone seems to blow Google Now right out of the water in terms of capability and usability except for punctuation in voice recognition.

On the other hand, generally I'm pretty impressed with just the voice recognition/transcription by Google on my Android phones (exception: "ferociously"). Transcriptions in Google Voice on the other hand are, hm, marginally good enough to often get a general gist of a call before I return it, but if I need the actual details of the message there's no choice but to listen to it. This includes calls made by me, from my phone that I also do voice recognition on, into a Google Voice number that I use for some tracking.

It is interesting that the transcriptions in the web interface show how confident they are of the quality for each word by how dark the word is.


>We have voice recognition since at least late 90's but you have to spend long hours training

Microsoft could ship their pre-trained dataset with the computer, or make it available as a download. They choose not to.


gok covered this fairly well in this comment: https://news.ycombinator.com/item?id=9978755

It's a big, really big dataset, and things get far better the more data you have. In order to even have it, let alone keep it up to date, a significant amount of space and memory would be needed.


That dataset contains languages and accents that are not relevant to me. It could easily be culled to a size where it's no compromise on language+region of birth alone.


It's multiple gigabytes for a single language and accent. I wasn't even talking about the full dataset across languages.

You don't seem to understand the size requirements for getting a good dataset. Don't you think Microsoft would have loaded up the dataset if it was easy and cheap? They didn't have a desktop cloud-based recognition service until literally yesterday, so they had many, many years to include this magical dataset that solves all your problems without cannibalizing another one of its products. They didn't because it's not feasible right now. In the future? Maybe, hell, probably.


>It's multiple gigabytes for a single language and accent.

I have 602 GB free on my first hard drive, 519 free on my second, 699 on my third, 1.06TB free on my forth, 405GB free on my fifth and 46 free on my 6th.

If Microsoft would be kind enough to release it to me, I think I can probably find a corner to squeeze it into.

>Don't you think Microsoft would have loaded up the dataset if it was easy and cheap?

No, I don't. Microsoft wants our voice data, it's extremely valuable to them. They've figured out that there's gullible people like you who will swallow the "it can't be moved onto a local computer" tale hook, line and sinker, and thus give it to them for free.

Why are you doing that? Grow some cynicism.


Do you also have that much memory? The dataset would need to be loaded into memory at all times recognition is used to be useful.

I never said storage is the limiting factor, in fact, I even said you need a significant amount of "space and memory".


> That dataset contains languages and accents that are not relevant to me.

You are assuming that they have a different model for each language and region, which I don't think is true since Cortana understand my foreign accent besides of being using USA as a region (Canadian version works really well too).

> I have 602 GB free on my first hard drive, 519 free on my second, 699 on my third, 1.06TB free on my forth, 405GB free on my fifth and 46 free on my 6th.

Good for you, but I don't have that many free space. Gee, I only have 20Gb free on my laptop. I think you might be bias about your situation but not everyone has +1Tb of free space waiting to be used for a voice command.


My colleague wrote his diploma thesis with a voice recognition software (the market leader) because he sucks at typing. Desktop voice recognition can't be that bad.


I have second hand commercial support with a leader software in the market (that is I had coworkers doing the commercial support) and the amount of bugs and trickery some users had to go through with it makes you wonder how they can sell any copy at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: