Microsoft makes AI tool for better search available as an open source project

danielcampos93 · on May 16, 2019

I work on the team that built this and many of our other ranking tech. I can answer any questions people have. Also shameless plug: using this tech we released some artificial search sessions as an exploratory dataset. https://github.com/dfcf93/MSMARCO/tree/master/ConversationlS...

mv4 · on May 16, 2019

the Github link gave me a 404

update: you made a typo, should be https://github.com/dfcf93/MSMARCO/tree/master/Conversational...

sinkpoint · on May 16, 2019

Great work! How does this compare to hierarchical navigable small world graph, like nmslib?

danielcampos93 · on May 16, 2019

I wasnt really familiar with nmslib but I guess I think its a little bit faster and scales to billions of items.

mv4 · on May 16, 2019

Cool project! One of the challenging use cases mentioned in the description is people taking a picture and asking the search engine, 'What is this?'. Has this been solved? (it is a very hard problem if taken beyond simple object classification)

danielcampos93 · on May 16, 2019

In many ways yes. For images Instead of using some NLP parser and vectorization technique you vectorize the image and then do a similar lookup

danielcampos93 · on May 16, 2019

If you want to play with the tech there is a developer kit. https://www.bingvisualsearch.com/develop

mv4 · on May 16, 2019

how effective is it at identifying mysterious objects, the way it's crowdsourced on Reddit's "/Whatisthis"? (meaning, how big is the index?)

13of40 · on May 16, 2019

LOL - that's exactly the use case I was thinking of. But wouldn't an AI have to be trained with lots of examples of the item in question to get a high quality detection? If so, it might not be able to find that one image that antique shop in Whereverston has on their web site.

danielcampos93 · on May 16, 2019

I don't work on the image side but from what I understand the entire index is vectorized so its not categorizing them like a imagenet system would as much as finding a nearest neighbor that can be categorized.

bobosha · on May 16, 2019

1. how's this different from pysparnn or faiss?

2. does this support both sparse and dense vectors ann?

anewhnaccount2 · on May 16, 2019

Good press release title. Now just add it to the pile ;) https://github.com/erikbern/ann-benchmarks

riyadparvez · on May 16, 2019

Yes, exactly! This seems like totally overblown title. It's akin to saying Google open sourced their key search tech Kubernetes, which is an open source rendition of Borg, where all the Google workloads run on top of.

suyash · on May 16, 2019

lol - good point, foiled the Great Microsoft PR play

WanderPanda · on May 16, 2019

[flagged]

bryanrasmussen · on May 16, 2019

no it turns out it's "A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario."

But Math.random() is a really good guess.

westpfelia · on May 16, 2019

Open sourced the print statement