Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The interesting fact is how much data they store with no apparent reason. They can perform speech to text on the fly and throw away the recording.

Machine learning? Do it with people paid to talk to Alexa.



1.) It's not actually scalable to pay enough people to talk to Alexa to train these models.

2.) That data would not be representative of what real users are doing so it would bias the models.


I'm not necessarily disagreeing with the thrust of your argument (do you really need to store all that?), but constraining your sample to people paid to talk to Alexa can create huge swathes of bias. You'd need to make sure the people you pay also reflect all the accents and languages of the people who use Alexa. On top of that, without some amount of voice data, how are you to even know what that accent breakdown looks like? That's a near-impossible task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: