Having done only the smallest amount of work trying to apply academic research, I have to say that the standard approach through a lot of AI is to develop good ideas about as far as needed to get a few papers done (though I don't know NLP in particular).
The work of putting this stuff together into a system which works consistently and at-scale is hard.
Basically, it is unfortunate that the standard academia is publishing papers (or pdf files) rather than publishing libraries. With this standard, the academics can't even readily use each others algorithms with this situation.
But when academics have this standard, it seems dumb to hear the complaint "uh no, you didn't do anything but apply our ideas..."
I am not an expert on speech recognition, but I am somewhat involved with machine translation. For MT, it's very architectural systems issues that limit us, first from trying different models and algorithms, but also in general.
How do you know ideas are "good" enough to be publishable unless you do plenty of experiments involving billion (or trillion) word corpora? I have a hard time imagining that research in other fields don't require validation.
I'm not saying that the papers that get published aren't good or valid. "Good enough to publish a paper" is indeed good.
It's just that once the paper is published, it becomes a cul-de-sac, a nice little city with no roads leading in or out, etc. Other researchers can only use the result by reproducing the idea by-hand (or at best through crufty Matlab code).
Yes, I'm sure the papers I've scanned involved considerable work and data (I worked in computer vision). But that work is often if not generally unavailable to the reader of the paper.
The point is that in creating a working system, Google has to do more than extend academic research, even if academic research involved good ideas that had been given some thorough tests in isolation.
> Basically, it is unfortunate that the standard academia is publishing papers (or pdf files) rather than publishing libraries. With this standard, the academics can't even readily use each others algorithms with this situation.
I am not sure code should be the uniform standard for judging (computer) scientific work. It is much harder to review/validate a theory when it is supplied in the form of code than paper.
Of course code can serve as an addendum to your work or may be useful as a demonstration of it. (Many academic authors do seem to do this)
The work of putting this stuff together into a system which works consistently and at-scale is hard.
Basically, it is unfortunate that the standard academia is publishing papers (or pdf files) rather than publishing libraries. With this standard, the academics can't even readily use each others algorithms with this situation.
But when academics have this standard, it seems dumb to hear the complaint "uh no, you didn't do anything but apply our ideas..."