The interesting fact is how much data they store with no apparent reason. They c...

Analog24 · on Dec 20, 2018

1.) It's not actually scalable to pay enough people to talk to Alexa to train these models.

2.) That data would not be representative of what real users are doing so it would bias the models.

madrox · on Dec 20, 2018

I'm not necessarily disagreeing with the thrust of your argument (do you really need to store all that?), but constraining your sample to people paid to talk to Alexa can create huge swathes of bias. You'd need to make sure the people you pay also reflect all the accents and languages of the people who use Alexa. On top of that, without some amount of voice data, how are you to even know what that accent breakdown looks like? That's a near-impossible task.