I'm not much of an expert on this stuff... what do you all think about their cha...

jwp · on Feb 26, 2007

I work on speech recognition. What Blinkx is doing isn't novel. (Sorry, Blinkx.) Google has top speech researchers working on search for speech and video. Same for MSFT. Remember the Kai-Fu Lee thing? The guy who built CMU Sphinx, an open-source HMM-based speech recognizer, in the late 80s? He and many other solid speech people are at Google to work on searching audio/video.

And there are other companies in this space, but they tend to center around US gov customers. Virage is one. It's owned by the Autonomy group, where, according to the article, the founder of this company used to work.

There's also Podzinger, a subsidary of BBN, which is another company that gets a lot of gov business. Podzinger runs BBN's speech recognition system on podcasts and videos, and pipes the output to a search engine: <http://podzinger.com/>.

I could go on... And if people are interested, I'd be happy to post links to some relevant papers and tools.

To my mind 2 interesting things are going on here. 1) The company appears to be thriving by applying 20 year-old stuff from the lab to a new problem, in apparently no special way. (And that's not a bad thing!) 2) They got an article in the NYT business section to talk about Hidden Markov Models. Although maybe that's not so surprising, since hedge funds have recently started speaking out about using machine learning.

danielha · on Feb 25, 2007

All I know is that it works. I tried out a few terms and got what I had in mind every time.

They heavily emphasize speech recognition, I think. For what this is, it's very cool. The technology is there and the product works. I think this is going places.

dangrsmind · on Feb 26, 2007

I'd say I'm a little skeptical.

The first question I'd have is how fast they can parse video. The second is how much it costs to do it.

It seems you would have to be able to do recognition much faster than real-time for a realistic web video search capability (see for example http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=599600) and you would certainly need a lot of hardware to do this at scale for millions of video clips.

See also: http://www.newmediamusings.com/blog/2005/09/blinkx_a_citize.html

jwp · on Feb 26, 2007

The first link you cited is spot on. The authors are from Univ of Cambridge, and work on HTK <http://htk.eng.cam.ac.uk/>.

That paper is 10 years old. As I'm sure you can imagine, there have been improvements in the field since then. To be completely honest, I don't stay on top of search applied to speech, but the keyword you want is "Spoken Document Retrieval" (SDR). Ciprian Chelba and TJ Hazen do cool stuff in this area; they are giving a tutorial at ICASSP this year SDR.

An aside. Both of these approaches use the fact that when you process speech, you essentially form a graph of words (or phonemes). Paths through the graph represent possible transcriptions. So, since graph is a denser, richer thing to search than the transcript, and we've got graph algorithms sitting around, there are neat tricks you can do to build a search engine index for speech...

I've recently been reading some interesting work that uses locality-sensitive hashing to search audio. The Google speech people are presenting a lot of it at ICASSP this year. See this post for more, and chase the links in their papers for even more: <http://googleresearch.blogspot.com/2007/02/hear-here-sample-of-audio-processing.html>

dangrsmind · on Feb 26, 2007

Thanks for the information and links. My background is in video and image processing, well originally multiple target tracking, sensor management, and sensor fusion, but now I work in biometrics and video analytics. Understood about processing the information into a graph.

Your point about Google raises one of the obvious questions about this company... if Google is doing leading edge research in this field it seems unlikely they need to buy a "video search destination" site employing lesser technologies, that is unless it gets really really big (i.e. YouTube). They might be interested in some deep technology, but my impression from the reading I've done and the links you've posted is that Blinkx is using standard well known techniques to achieve their results.

FWIW: I was applying Markov modeling to areas such as mission planning and modeling integrated air defense networks back almost twenty years ago now. We didn't call them HMMs, but there were some very similar ideas employed.

jwp · on Feb 26, 2007

Hmm, perhaps we should talk. Email me at e40.32313371@bloglines.com if you're interested.