8
$\begingroup$

I am writing a review about analyzing data sampled from lognormal distributions, and I want to explain the problem with first running a normality test and using that result to choose which test to run.

The same problem comes up with first running a test for equality of variances, and using that result to choose the second test.

These kind of "two-stage" procedures (there must be a better name) will distort the results of the second test.

What I am asking for are some citations to papers that explain this problem and give examples of how two-stage testing can be misleading.

$\endgroup$
3
  • 2
    $\begingroup$ There is no problem with testing the distribution first beyond the fact that the tests form normality usually lack the power to be useful unless the samples are large enough that tests like t-test and ANOVA no longer care about the distribution! The distributional test hypotheses are in a different space from any hypotheses tested by the second stage tests, and I argue that any 'meta' null hypothesis regarding hypotheses in different spaces are plain silly and can be ignored. $\endgroup$ Commented Jun 23 at 21:30
  • $\begingroup$ 1. Are you talking about testing normality on the original data or the logs? 2. Are there constraints on which tests the users might choose between? $\endgroup$
    – Glen_b
    Commented Jun 27 at 5:20
  • 1
    $\begingroup$ Beyond the very good references offered in the answers below, you will want to look into the literature on "the garden of forking paths." Andrew Gelman's blog is a good starting point. $\endgroup$
    – rolando2
    Commented Jun 27 at 23:55

4 Answers 4

6
$\begingroup$

This one has detailed discussion of the issue and rich literature review:

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Also the problem leads to a nice "paradox" (which is in fact also explained in the paper above, but there is an older paper where I explain this): Misspecification paradox

$\endgroup$
5
$\begingroup$

A note on preliminary tests of equality of variances discusses the downsides of two-stage testing to choose which version of the $t$-test to use:

Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse.

and

Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate-variances test unconditionally whenever sample sizes are unequal.


Zimmerman D. W. (2004). A note on preliminary tests of equality of variances. The British journal of mathematical and statistical psychology, 57(Pt 1), 173–181. https://doi.org/10.1348/000711004849222

$\endgroup$
6
  • 1
    $\begingroup$ That paper is unavailable to me, so can you add some information? First quote: worse than what? Second quote: by how much? $\endgroup$ Commented Jun 24 at 22:13
  • $\begingroup$ @MichaelLew: The paper is in sci-hub sci-hub.se/10.1348/000711004849222 $\endgroup$ Commented Jun 24 at 22:38
  • $\begingroup$ Thanks @kjetilbhalvorsen, the deleterious effects of the two-stage procedure in that paper are larger than I expected, but they focussed entirely on situations where the sample sizes for the two groups are unequal. I do not see any description of the effect of the preliminary variance test on the power of the overall procedure, but their bottom line is that if the sample sizes are unequal, just use the welch-Scatterthwaite t-test.. $\endgroup$ Commented Jun 24 at 23:16
  • $\begingroup$ @MichaelLew: Should this be treated differently with randomized experiments versus preexisting groups? See my stats.stackexchange.com/questions/434928/… with randomized groups a difference in variance is in itself an indication of tretament effect $\endgroup$ Commented Jun 24 at 23:51
  • $\begingroup$ @kjetilbhalvorsen You raise an interesting point, and one that I suspect is overlooked too often. If there is a reason to expect that the variances should be equal then a finding that they differ between treatment groups does indeed point to an effect. A thoughtful approach to inference would not ignore that and so the prescription to always use the Werlch-Scatterthwaite t-test would give a false sense of having done a 'good' analysis. $\endgroup$ Commented Jun 25 at 7:02
4
$\begingroup$

I think @ChristianHennig's paper is great, here are a few more from my collection (the contexts are clear from the titles):

Campbell, H., and C.B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Campbell, Harlan. 2021. “The Consequences of Checking for Zero-Inflation and Overdispersion in the Analysis of Count Data.” Methods in Ecology and Evolution 12 (4): 665–80. https://doi.org/10.1111/2041-210X.13559.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

$\endgroup$
3
$\begingroup$

This is not an answer and might be better thought of as a comment, but I'm sure that it will get too long.

I know from personal experience that researchers often know a great deal about the data generating systems on which they work before they gather the final data that are subject to statistical analysis in support of their inferences. Much more than can be gleaned from those final datasets alone by even the most sophisticated statistician. Choosing the statistical analysis on the basis of that prior knowledge will not affect the error rate properties of any subsequent testing, will it?

In experimental pharmacology there are often several times as many preliminary datasets than final as the project is developed and a biological system is explored prior to the final experimental design being formalised. In many cases far, far more given that research laboratories will often work with particular biological systems for many years on end. Those pre-existing datasets and experience would usually allow the researchers to know with some confidence the distributional form of the final data before it is gathered. Furthermore, the researchers will usually know how reasonable it might be to assume iid directly from the experimental design.

I do not often see answers that suggest that prior information be used in the choice of test procedure (or even for thinking about what inferences might be drawn) and so I suspect that the papers being suggested here equivalently ignore the possibility of using that preliminary information (researchers' knowledge). I may be mistaken in that, particularly as I have read only one relevant paper in detail (Zimmerman 2004, cited by Frans Rodenberg) and skimmed another (Shamsudheen, Iqbal, and Christian Hennig. 2023), so please correct me if I'm wrong.

$\endgroup$
1
  • 2
    $\begingroup$ I totallly agree. Decisions about parametric or nonparatric, or to log or not to log, should be based on what is known about the variable in general, not just data in one experiment $\endgroup$ Commented Jun 28 at 17:02

Not the answer you're looking for? Browse other questions tagged or ask your own question.