Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you don’t believe summary statistics of the data, why would you believe the data?

If you think Facebook is lying in the report, why wouldn’t they lie if releasing the data?



It's one thing to fake or massage a summary, it's an entirely different thing to fake the whole data set. This divergence increases based on the amount of data.

Your question is essentially equivalent to: "If you don't believe what the politician said about the event, why would you believe the video recording of it?"


If the politician is holding the camera and editing the footage, it's a valid question.

Anyway you wouldn't need to fake a whole data set; you just need to employ a little bias in what data you choose to collect, how you collect it, how you process it and how you present it. Those things happen all the time. Or on an only slightly more extreme level, you could also selectively censor it using automation. People are used to thinking data is truth, but even the best data is always filtered through a human source.


Were the first point accurate, it would instantly become not so upon recognition of the existence of the concept of delegation.

The point about selectively choosing data, how to process it, etc is important, and often overlooked. People are accustomed to working with what they're given, but objectivity may be a step further back.

Regardless, such things can only be better revealed by providing the data.

If the goal is greater illumination, there is simply no argument to be made against greater transparency.


Yep, transparency leads to illumination, so in that way those two analogies work together like sunlight and window panes. But you can lie with data, was my singular point. More data is harder to fake, seems to be yours, and I suppose I would agree. But once faked, albeit at whatever difficulty, more fake data is more dangerous and less illuminating than less fake data. (Edit: Not only because there's more of it, but because it ironically has that very property of being or seeming more truthy or trustworthy because there's more of it.)

Anyway I have no idea what you're saying in your first sentence I gotta say. I recognize the existence of delegation, and yet still trust any party's data (and the completeness, honesty and transparency thereof) in direct proportion to some estimation of that party's general trustworthiness and whatever I know or can surmise about their aims, agendas and interests in relation to the subject of the data. And when the subject of the data is the very party collecting it, you can surmise immediately some of the probable interests and aims. They probably want to look good and not bad for example, or make more money and not less, etc.


It's also easier for someone to figure out they are faking it if they actually released the data. Or at the very least say, "Some of this data seems off"


Convention, basically.

Releasing fraudulent data is different, culturally and legally, to releasing a powerpoint-style report summarizing curated "key points," selected, defined and quantified in a non transparent way.

Every year, public companies release annual reports. Hundreds of pages of largely BS. A few dozen pages of real financial data.

It's not uncommon for a page 1 chart of the company's market share, provided by a 3rd party to be total nonsense. The page 114 figure summarizing tax liabilities needs to be auditable. That's part way to transparent. Not everyone can see the data, but someone can.


There are a lot more techniques to see if a large data set has been altered than a summary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: