Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

NHST, which is part of frequentist statistics, is wrong, plain and simple. It answers the wrong question (what's the probability of the data given the hypothesis vs. what's the probability of the hypothesis given the data), and will favor H1 under conditions that can be manipulated in advance.

There is a total lack of understanding of how it works, but people think they know how to use it. There are numerous articles out there containing statements like "there were no differences in age between the groups (p > 0.05)". Consequently, it is the wrong thing to teach.

That's apart from the more philosophical question: what does it mean when I say that there's a 40% chance that it team A will beat team B in the match tomorrow?



NHST is not wrong. It’s widely misused by people who barely understand any statistics.

Reducing frequentist statistics to testing and p-value is a huge mistake. I have always wondered if that’s how it is introduced to some and that’s why they don’t get the point of the frequentist approach.

Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air. It’s also a lot of relatively advanced mathematics if you want to teach it well as defining random variables properly requires a fair bit of measure theory. I think the perceived gap comes from there. People have a somewhat hand wavy understanding of sampling and an overall poor grounding in theory and then think Bayes is better because it looks simpler at first.


> Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air.

You're "pulling priors out of thin air" whether you realize it or not; it's the only way that estimation makes sense mathematically. Frequentist statistics is broadly equivalent to Bayesian statistics with a flat prior distribution over the parameters, and what expectations correspond to a "flat" distribution ultimately depends on how the model is parameterized, which is in principle an arbitrary choice - something that's being "pulled out of thin air". Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience, and frequentists can use "robust" statistical methods to exceptionally take prior information into account; so the difference is even lower than you might expect.

There's also a strong argument against NHST specifically that works from both a frequentist and a Bayesian perspective: NHST rejects the Likelihood principle https://en.wikipedia.org/wiki/Likelihood_principle hence one could even ask whether NHST is even "properly" frequentist.


> You're "pulling priors out of thin air" whether you realize it or not

No, you are not. That’s an argument I often seen put forward by people who want the Bayesian approach to be the one true approach. There are no prior whatsoever involved in a frequentist analysis.

People who say that generally refer to MLE being somewhat equivalent to MAP estimation with a uniform prior in the region. That’s true but that’s the usual mistake I’m complaining about of reducing estimators to MLE.

The assertion in itself doesn’t make sense.

> Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience

That’s very hand wavy. The issue is that priors have a significant impact on posteriors, one which is often deeply misunderstood by casual statisticians.


Frequentists big complaint about priors are that they are subjective and influence the conclusions of the study. But the Frequentist approach is equivalent to using a non-informative prior, which is itself a subjective prior that influences the conclusions of the study. It is making the assumption that we know literally nothing about the phenomenon under examination outside of the collected data, which is almost never true.


> There are no prior whatsoever involved in a frequentist analysis.

It may not be everywhere, but even in the simplest case of NHST, there certainly is. It assumes no difference between H0 and H1. And NHST is basically the topic of this entire thread: it's what we should have stopped teaching a long time ago.


Let’s say you run the most basic regression Y = X beta + epsilon. The X is chosen out of the set all possible regressors Z (say you run income ~ age + sex, where you also could have used education, location, whatever).

Is that not equivalent to a prior that the coefficient on variables in Z but not in X is zero?


NHST is wrong as a matter of theory. It is a weird amalgamation of null hypothesis testing (Fisher) and significance testing (Neyman and Pearson). Those two approaches by themselves are correct, and theoretically sound, given the appropriate assumptions are met.

NHST is not associated with any statistician, and you will find no author claiming to be its inventor. It is a misunderstanding of statistics apparently originating from psychology back in the 1960s, or at least that's as far back as I've found it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: