9
$\begingroup$

How did academics support hypotheses before the null hypothesis significance testing (NHST) framework was, in part, introduced and democratized by Fisher/Neyman & Pearson? Suppose NHST was never a thing, what are some plausible frameworks academics could employ to support their hypotheses today? Are there alternatives based on mathematics outside of statistics and/or probability?

$\endgroup$
2
  • $\begingroup$ A little like asking how we would get along if no one had invented multiplication; not sure to overcome that. Maybe confidence intervals could substitute for some tests of hypotheses. $\endgroup$
    – BruceET
    Commented Sep 21, 2020 at 5:56
  • 4
    $\begingroup$ There were many tests performed before either Fisher or Neyman and Pearson. The earliest hypothesis test was probably Arbuthnot (1710), who essentially performed a sign test (binomial test with $p_0=\frac12$). However, in any case science mostly managed pretty well without statistical hypothesis tests. Careful experiment, observation and repeated replication can render conclusions evident enough. $\endgroup$
    – Glen_b
    Commented Sep 21, 2020 at 9:31

2 Answers 2

14
$\begingroup$

Neither Fisher nor Neyman and Pearson proposed a "null hypothesis significance testing framework". Instead, Fisher demonstrated the significance testing framework and Neyman and Pearson later demonstrated the hypothesis testing framework. They are not the same and they are not similar in their objectives. The significance testing framework attempts to quantify the evidence in the data against a null hypothesis, and uses a continuous p-value. The hypothesis testing procedure entails a decision to reject or not reject the null hypothesis and it does not use a p-value.

The NHST hybrid that you ask about is an incoherent mixture of two incompatible approaches. Neither Fisher nor Neyman and Pearson would be happy to have their names attached.

Please see this paper for a more complete explanation: https://link.springer.com/chapter/10.1007/164_2019_286

$\endgroup$
2
  • 2
    $\begingroup$ Another useful reference (to Aris Spanos' textbook chapter on the matter) is included in this answer by Alecos Papadopoulos. $\endgroup$ Commented Sep 21, 2020 at 7:15
  • $\begingroup$ Thank you for pointing out the distinction between the two frameworks; the paper was a good read. $\endgroup$
    – fool
    Commented Sep 21, 2020 at 18:20
1
$\begingroup$

(While Michael Lew's answer is informative, it does not seem to answer the question posed.)

Glen_b's comment points out that statistical hypothesis tests aren't always needed. But they have their uses when data are expensive or otherwise limited, especially when there's also substantial variation that cannot be completely controlled (e.g. human behavior) or that you don't want to control because your question is about variable real-world settings (e.g. studying how fast a crop tends to grow out in fields with varied weather and soil and so forth, even though you could measure the same crop's growth in a controlled greenhouse if you wanted to).

In these small-sample, high-variability settings, we need some way to tell if the signal in our data is trustworthy, vs if it's likely to be swamped by variability in the data. For some history on this topic, I recommend Stephen Stigler's book "The Seven Pillars of Statistical Wisdom," specifically the 4th pillar, "Intercomparison." Hypothesis testing and confidence intervals allow us to (often) make valid statistical comparisons "interior to the data" instead of against an external standard. Tools like the t-test let you compare two groups using a dataset, and to assess the precision of that comparison by using the variation in the same dataset.

As an alternative, without this statistical framework, scientists could instead have relied on past experiences or external data from other studies to help them judge the quality / precision of the comparison from the present study.

For example, instead of a t-test, you'd need to build up extensive experience with the population you're studying: "In samples of this size from this population, sample means usually don't vary by more than XYZ. So if we compare two groups, and they have sample means which are much further apart than XYZ, we can be pretty confident about which mean really is bigger; meanwhile if they're closer together, we'll reserve judgment. But in samples of this other size, or from this other population, it's a different story..."

The t-test gives you a way to skip all that, by using sample standard deviations to estimate standard errors from the same samples whose means you're comparing. So you can make reasonably-safe comparisons even in new situations where you don't already have experience. (But of course having substantial prior experience is also helpful whenever possible---not only as a safeguard against errors in hypothesis testing, but also against other problems with study design & interpretation!)

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.