0
$\begingroup$

I am analyzing a dataset, where I have 3 categories (independent variable), and one continuous value (dependent variable). I am studying if the continuous value is significantly different between the three groups. I have 65 values in each category. The 3 groups are homoscedastic, but the dependent variable is not normal in any of the groups. I currently use a Kruskal-Wallis test, with the Dunn's test for post-hoc comparisons, where the p value is adjusted with a Benjamini Hochberg correction.

Outside of the pvalues, do I need to consider residuals, coefficients or other values for this test?

Given that I have > 30 samples, should I use an ANOVA instead of the Kruskal-Wallis since it's robust to departures from normality?

$\endgroup$
5
  • 1
    $\begingroup$ You seem to switch the meanings of "dependent" and "independent." What likely matters the most is how the group distributions depart from Normality; and for this purpose, examining the residuals is one of the most effective and insightful things you can do. Perhaps you could describe those residual distributions to us? $\endgroup$
    – whuber
    Commented Jun 13, 2023 at 17:14
  • $\begingroup$ @whuber thank you! I'm running this same comparison on 50 different datasets actually, which all have the same 3 categorical independent variables and the same continuous dependent variable. I've already run a Kruskal-Wallis test and I am happy with the results. Is an ANOVA more preferable? In which case, if the data in the groups themselves is not normal, but the residuals are normal, should I use the ANOVA or the KW? $\endgroup$
    – user81371
    Commented Jun 13, 2023 at 18:05
  • $\begingroup$ It is mathematically inconsistent for the groups to appear to derive from non-Normal distributions but the residuals do not! You might stick with K-W but even so it's always a good idea to examine the residuals, because even K-W makes distributional assumptions. The issue with 50 datasets is a new, separate one. It raises the concern about correcting for multiple comparisons but also suggests the possibility of borrowing strength to the extent you can assume some common behavior among the datasets. $\endgroup$
    – whuber
    Commented Jun 13, 2023 at 18:09
  • $\begingroup$ @whuber thanks! Do you have some resources for examining the residuals of a K-W test? I couldn't find any. Regarding the 50 datasets, there's no overlapping data between them. They are disjoint, and I need to run this analysis on each of them, so the FWR corrections don't apply I think. Please correct me if I am wrong. $\endgroup$
    – user81371
    Commented Jun 13, 2023 at 18:12
  • $\begingroup$ What matters are the effects of incorrect decisions and the rate at which you can tolerate them. As far as examination of residuals goes, the assumption under the null is that each group is an independent sample of the same distribution, so the purpose of examining the residuals is to assess that situation especially if you might not be rejecting the null (you should worry, for instance, that some outliers might hide clear overall differences). That can be done graphically in many ways (such as QQ plots) as well as formally using nonparametric distribution tests. $\endgroup$
    – whuber
    Commented Jun 13, 2023 at 18:30

0