Skip to main content
added 87 characters in body
Source Link
Ben Bolker
  • 44.5k
  • 3
  • 119
  • 168

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer/when to stop and accept the results of the last analyses because they're 'good enough' (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we fail to reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric"). Two-stage testing may inflate type I error and/or fail to increase overall power, but it depends on the details.

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, Harlan. 2021. “The Consequences of Checking for Zero-Inflation and Overdispersion in the Analysis of Count Data.” Methods in Ecology and Evolution 12 (4): 665–80. https://doi.org/10.1111/2041-210X.13559.

Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric"). Two-stage testing may inflate type I error and/or fail to increase overall power, but it depends on the details.

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer/when to stop and accept the results of the last analyses because they're 'good enough' (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we fail to reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric"). Two-stage testing may inflate type I error and/or fail to increase overall power, but it depends on the details.

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, Harlan. 2021. “The Consequences of Checking for Zero-Inflation and Overdispersion in the Analysis of Count Data.” Methods in Ecology and Evolution 12 (4): 665–80. https://doi.org/10.1111/2041-210X.13559.

Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

added 117 characters in body
Source Link
Ben Bolker
  • 44.5k
  • 3
  • 119
  • 168

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric"). Two-stage testing may inflate type I error and/or fail to increase overall power, but it depends on the details.

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric").

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric"). Two-stage testing may inflate type I error and/or fail to increase overall power, but it depends on the details.

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.

Source Link
Ben Bolker
  • 44.5k
  • 3
  • 119
  • 168

I want to make a distinction that doesn't seem to have come up in previous answers. There is a difference between

  • examining the results of several complete analyses (estimates, p-values, confidence intervals) and deciding post hoc which one you prefer (this is terrible)
  • vs. examining the results of significance tests of the assumptions (or even graphical evaluations of the assumptions) to decide what to do next.

The latter is not as bad, because you're not directly conditioning on outcomes you want to find, although there are still plenty of reasons (frequently discussed on Cross Validated, e.g. here) to avoid this kind of testing. The bottom line is that the properties of the conditional procedure (e.g. do a parametric test if we reject Normality of residuals, otherwise do a nonparametric test) are different from, and much harder to analyze, than the unconditional procedures ("always parametric" or "always nonparametric").

Many statisticians have looked at particular cases (see refs below); I found Hennig's (2022) presentation below especially clear.


Campbell, H., and C. B. Dean. 2014. “The Consequences of Proportional Hazards Based Model Selection.” Statistics in Medicine 33 (6): 1042–56. https://doi.org/10.1002/sim.6021.

Hennig, Christian. 2022. “Testing in Models That Are Not True.” University of Bologna. https://www.wu.ac.at/fileadmin/wu/d/i/statmath/Research_Seminar/SS_2022/2022_04_Hennig.pdf.

Rochon, Justine, Matthias Gondan, and Meinhard Kieser. 2012. “To Test or Not to Test: Preliminary Assessment of Normality When Comparing Two Independent Samples.” BMC Medical Research Methodology 12 (1): 81. https://doi.org/10.1186/1471-2288-12-81.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Zimmerman, Donald W. 2004. “A Note on Preliminary Tests of Equality of Variances.” British Journal of Mathematical and Statistical Psychology 57 (1): 173–81. https://doi.org/10.1348/000711004849222.