Skip to main content

Signals situations where one is concerned about achieving intended power and size when more than one hypothesis test is performed.

In statistical hypothesis testing, the size is the largest chance of rejecting the null when the null is true (a "false positive" error). The power is the chance of not rejecting the null; it depends on the "effect size" (a measure of how far reality actually departs from the null). Caeteris paribus, power and size are inversely related (one must increase if the other is decreased), so considerations often focus on size, which is simpler to analyze.

When more than one hypothesis test is performed to make a binary decision, the chance of a false positive is usually greater than the size of any of the tests used for that decision. For example, suppose groups of "control" and "treatment" subjects are randomly selected from the same population and each subject is given a questionnaire comprising 20 yes-no questions. Let the groups be compared separately for each question using a test of size .05. If the comparisons are independent, then the chance of at least one of them rejecting the null equals $1 - (1 - 0.05)^{20}$ = 0.64. Thus a nominal false positive rate of 0.05 in each test is inflated to a decision false positive rate of 0.64.

To avoid unacceptably large chances of reaching mistaken conclusions in such "multiple comparisons" cases, either an overall test of significance is initially conducted or the sizes of the individual tests leading up to the decision are decreased (that is, the tests are made more stringent). Examples of the former are the F-test in an ANOVA setting and Tukey's HSD test. Exemplary of the latter approach is the Bonferroni correction.