0
$\begingroup$

A. I know from the net that for a design with one binary variable and a second variable that is continuous but is NOT normally distributed, I can use BOTH the point-biserial correlation (which is basically the parametric Pearson correlation formula) as well as the Rank Biserial Correlation (which is equal to the nonparametric Spearman or Kendall τ correlations).

B. I also have read about linear and monotonic correlations, which implies that even the Pearson coefficient (and of course, point-biserial) is OK for nonnormal distributions.

C. And I understand that with one binary variable and a continuous one, it might even be better to use an independent-samples comparison test (e.g., unpaired t or Mann-Whitney U) instead of a correlation coefficient.

D. I know that the Mann-Whitney test will yield the same p-value as the Spearman coefficient, while the t-test will give the same p-value as the Pearson coefficient (and point-biserial).

E. I know when the groups are not large enough AND when the error terms are not normally distributed, I should use the nonparametric Mann-Whitney instead of t-test.

Two Questions:

The above assumptions cause some inconsistencies and confusion in the following case:

I am analyzing this design with 2 groups of 20 patients each; the independent variable is Treatment (the treatments A or B) and the dependent variable is the continuous Length measured in each group. The latter is NOT normal (the groups and the error terms are all NON-normal).

The problem is that in this particular design, the nonparametric Spearman and Mann-Whitney tests yield a statistically significant p-value, while the parametric point-biserial [Pearson] and t-test yield a quite non-significant p-value > 0.1.

Question 1. Which one should I use? The nonparametric Spearman / Mann-Whitney? Or the parametric point-biserial [Pearson] / t-test? On the one hand, the assumption E dictates that I must use the nonparametric Mann-Whitney. On the other hand, the assumptions A and B allow me to use the parametric point-biserial [which is actually Pearson] correlation and by extension the t-test. So what should I use?

Question 2. The assumption E seems to be in total conflict with the assumptions A and B: The results of Spearman / Mann-Whitney are identical, and so are the results of the point-biserial [Pearson] / t-test. So if I am allowed to use the point-biserial [Pearson] in the absence of normality (assumptions A and B), why not the t-test which gives the EXACT SAME result as point-biserial?

$\endgroup$
4
  • 1
    $\begingroup$ "I also have read about linear and monotonic correlations, which implies that even the Pearson coefficient (and of course, point-biserial) is OK for nonnormal distributions" -- sure, if you're measuring linear correlation, that doesn't necessarily require you to have normality. Normality can come into testing it but you can use tests that don't assume normality (and the usual test for a null of zero correlation is fairly robust to the usual assumption in any case). Beware, however - a lot of texts seem to get the assumption wrong (you can derive it under two distinct sets of assumptions). $\endgroup$
    – Glen_b
    Commented Feb 27, 2022 at 23:12
  • 1
    $\begingroup$ To address another of your premises: "I know when the groups are not large enough AND when the error terms are not normally distributed, I should use the nonparametric Mann-Whitney instead of t-test." -- I am not at all convinced that the word "should" belongs in that sentence. It depends. Indeed I don't think I'd agree with the general intent there either. It depends. $\endgroup$
    – Glen_b
    Commented Feb 27, 2022 at 23:14
  • $\begingroup$ "Beware, however - a lot of texts seem to get the assumption wrong" Thanks for the clarifying comment and the heads up. ---- I can't claim that I understood your second comment completely. $\endgroup$
    – Vic
    Commented Mar 1, 2022 at 16:34
  • 1
    $\begingroup$ This imperative: "when the groups are not large enough AND when the error terms are not normally distributed, I should use the nonparametric Mann-Whitney instead of t-test" seems to be based on a misunderstanding. I disagree with it as stated. However, it requires a longish discussion to unpack all the mistaken premises in it, well beyond a comment. Some relevant answers are on site already, though. This sort of mistaken idea probably comes from the same source as the issue discussed in my first comment - authors who make one such error typically misunderstand a host of other things. $\endgroup$
    – Glen_b
    Commented Mar 1, 2022 at 22:43

0