Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I read this article because it was written by Nate Silver so I expected something substantial. There is not.

With much writing, he essentially says the following:

1. Microsoft's defense contains no information because they don't say how they weight the data learned from Google searches.

2. It's hard to say how Microsoft is actually using the data learned from Google searches.

For the crowd here this is essentially common-sense.

(edit: in point two I changed 'impossible' to 'hard')



This is classic Nate Silver. He will happily write a 1000 word essay explaining why, in great detail, with very high certainty, and despite the fact that pundits across the world are flapping their jaws about it, there is simply nothing to say about a topic.


That's the sort of pundit we need more of. Pretty much everything I've read on this story, from Google, from Bing, from various commenters on HN, has been pretty nonsensical. I'll take 1000 words of "hold your horses" over 100 words of nonsense any day of the week.

Though I admit I may only read the first couple hundred and skim about halfway through before closing the tab.


Where are you getting point 2 from?

In many cases where they have no other data, they clearly weight Google's single data point high enough to create a whole SERP around it. The claim that 'this is just one input among many' implies that some kind of cross-referencing or corroboration takes place. In this experiment that obviously didn't happen.


> they clearly weight Google's single data point high enough to create a whole SERP around it

This really doesn't tell us anything about how highly it is weighted relative to other data though, since obviously there will be no other data to weigh in on a nonsense search term.


Google managed to inject "7 to 9" of their honeypots into Bing. In 91+/100 cases apparently there was enough to outweigh the Google results on gibberish terms despite the efforts of 20 Google engineers between December 17 and December 31.


What were the terms of the other 91 cases? You make it sound like the Google engineers were slaving away for two weeks. My impression was that they ran the experiment during that time.


Google doesn't say how long they were working on it or anything at all about the methods they used. Indeed, they don't even seem to be certain how many honeypots they injected.

However hand they issued twenty engineers laptops which is more consistent with an ongoing effort than a single afternoon of Bing and beer - particularly when you consider that if there was a single method applied and it was known beforehand, one engineer could have automated the whole exercise.

The inability to accurately measure honeypot injections is a bit odd. As pure speculation, it may be that they were not sure if the methods they used to get Bing to show honeypots number 8 and 9 were legitimate.


While we are all speculating on certain bits of what happened in this she said/he said, I would presume in many/most of those 91 cases what happened wasn't that other inputs outweighed Google's but rather no user running the Bing toolbar (and who had it set to the phone home defaults) took the action which would create the Google-related input to begin with.


My point was that the algorithmic conclusion 'oh well, let's just print it anyway, since it's from Google' is pretty weighty on its own.


While we have no idea why this result is good, Google seems to think it's fine. If it's good enough for google, it's good enough for us.


"Since it's from Google" is an unproven assertion. They may do this for other data sources as well in similar circumstances.


I didn't assert that they don't. I asserted that they only had one data source in that case.

Whether they throw up new SERPs in reaction to data from other single sources might be interesting. But if true, it's also not very complimentary.


From the article's conclusion: "How much value Microsoft’s engineers are ultimately adding is hard to say. Both they and Google are extremely circumspect about revealing any detail about their algorithms."

It's common sense that they are using data from Google. That has pretty much been proven conclusively. Nate Silver telling us this again is not interesting. The possibility that he might try was.


So you paraphrase 'hard' as 'impossible'.

I actually think Silver makes a useful contribution, mainly because of his background and gift for illustrating data perception problems. If you're looking for hard facts and (more) smoking guns then you would naturally be disappointed.


Of course search engines don't like to reveal algorithms and weighting details, it would be a gold mine for rival and the SEO riggers. Why should Bing release such critical information to Google for 8/100 search queries?


The key question seems to be, if n is the information content in the page (ie number of bits needed to distinguish from a random search engine results page), what fraction of n would not be there without Google's ranking algorithm.

In the case of the honeypot pages, clearly the number is 100%.

In the cases of other results pages, we merely have Google's claim that the similarity has been rising over time, and that they attribute it to info extracted from Google.

Using information that people provide about where they click (hopefully with some kind of informed consent), in order to improve your algorithm, seems reasonable enough. If a large part of the info ended up coming from Google's algorithm, it starts to seems sketchy, and it's legitimate for Google to whinge.

Not that whinging ever stopped any other company, particularly Microsoft, from trying to take advantage of other companies' IP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: