3
$\begingroup$

I performed one-way ANOVA tests with post-hoc Tukey's to look at multiple comparisons and got highly significant results, but noticed the $F$-value on the ANOVA was high. The SDs in the groups were unequal, so I performed Welch-ANOVA with post-hoc Dunnett's $T^3$ (to account for unequal variance). Some results you would expect to be significant from the values and bar graphs (e.g. my highest concentration), but they are coming back insignificant despite being much higher than my lowest concentration in comparison to other concentrations that are coming up significant. How that can be?

$\endgroup$
2
  • $\begingroup$ Welcome to CV. It may help you to get a good answer if you provide some more details. 1) What were the p-values of the 2 ANOVA's, prior to any multiple comparisons? 2) How different were the sd's? Can you share their values? 3) Are your group sample sizes balanced (close to being equal) or not? 4) How many groups (and comparisons) are you running? For example, in a case of unbalanced samples, and very different sd's, the "traditional" 1-way ANOVA can exhibit inflated $\alpha$ error rates. In any case, providing more context will help you get a good answer. $\endgroup$
    – jginestet
    Commented May 17 at 20:04
  • 1
    $\begingroup$ A common cause of unequal variance is when the data are actually lognormal. Do the SDs get larger as the mean gets larger (so the ratio SD/mean) is almost constant)? If so consider log-transforming the data and running the ANOVA on the logarithms. $\endgroup$ Commented May 18 at 17:49

1 Answer 1

0
$\begingroup$

As commented by another user, it would be helpful to provide some more information to definitively say what is going on here, but I can still answer what can potentially be the culprit here.

First off, I want to address some phrasing that you should avoid:

I performed one-way ANOVA tests with post-hoc Tukey's to look at multiple comparison and got highly significant results...

There is no such thing as "highly" significant. Something either is significant or isn't based off an alpha cutoff (if utilizing the typical Fisher-Neyman-Pearson NHST framework of testing). Saying a $p$ value is highly significant is akin to treating it like a magnitude of the effect, but that is not the case, and can actually be the consequence of extremely small effects driven by things like overpowered designs.

Some results you would expect to be significant from the values and bar graphs (e.g. my highest concentration) are coming back insignificant despite being much higher than my lowest concentration in comparison to other concentrations that are coming up significant.

There could be a number of things going on here. It could be that despite having higher/lower bar graphs (of which I assume are your group means with standard error bars like these), your groups:

  • Having high standard error values / wide confidence intervals can contribute simply because the estimate is noisy and makes it more difficult to become statistically significant.

  • It can also be obvious that group differences are not statistically significant when you have groups whose confidence intervals overlap despite having clear differences in means.

  • If you have large differences in error bars (such as one group having a large confidence interval and another having a tiny one), this can also contribute to differences in statistical significance in the presence of clear group differences in the response.

  • Low sample size, which even with very clear differences in groups, may still not allow the design to be powerful enough to detect a statistically significant effect.

  • Depending on how many pairwise comparisons you made, having tons of groups with alpha corrections applied will inevitably lead to underpowered tests, which can contribute to the issue as well.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.