9
$\begingroup$

I have seen in published literature (and posted on here) that the asymptotic relative efficiency of the Wilcoxon signed rank test is at least 0.864 when compared to the t test. I have also heard that this only applies to large samples, although some books don't mentioning this (what's with that?).

Anyway, my question is, how small do things need to get before the above paragraph no longer applies?

In my case I have 4 pairs of data. If all assumptions hold, I know I have at least 90% power to detect an effect size of 2SD under the paired t test if I use an alpha of 0.1 and have moderately correlated data. However, I would like to use the Wilcoxon signed rank test due to the small sample size and inability to check assumptions but I'm concerned the test will have too little power if I do. Thanks!

$\endgroup$
7
  • 4
    $\begingroup$ "Asymptotic" anything in "small samples" does not make sense: it's a contradiction in terms. I suspect you are asking for the actual relative efficiency in small samples, period. The answer depends on the underlying distributions you are comparing and so will be complicated unless you have two specific distributions in mind. Many people choose the Normal for reference, but that might not necessarily be right for your applications. $\endgroup$
    – whuber
    Commented Oct 5, 2013 at 1:44
  • $\begingroup$ Yes I am looking for relative efficiency in small samples. Thanks for pointing that out. I want to know what is the worst I could do power-wise. I don't really have any underlying distributions in mind but if I were to use the normal, as you suggest, how would I proceed? I know that it will also depend on how correlated the data is. $\endgroup$
    – Jimj
    Commented Oct 5, 2013 at 2:00
  • 1
    $\begingroup$ What's "moderately correlated data"? $\endgroup$
    – Glen_b
    Commented Oct 5, 2013 at 9:07
  • 1
    $\begingroup$ Note that your above 90% power will be at the normal, not at the distribution where ARE is 0.864. As such the calculation should be done at the normal. $\endgroup$
    – Glen_b
    Commented Oct 5, 2013 at 9:14
  • $\begingroup$ @ Glen_b: You're right, I should specify what I was thinking by moderate correlation. I was thinking of a correlation of at least 0.4. So how then would I do the calculation? ALso, in terms of my original question about comparing efficiency of the two tests at small sample sizes, I did a bit of research on this topic. A couple of sources indicated that the answer is not completely clear in smaller samples but the Wilcoxon test performs reasonably well. Maybe I will just have to live with that type of answer for now. $\endgroup$
    – Jimj
    Commented Oct 5, 2013 at 18:00

1 Answer 1

12
$\begingroup$

Klotz looked at small sample power of the signed rank test compared to the one sample $t$ in the normal case.

[Klotz, J. (1963) "Small Sample Power and Efficiency for the One Sample Wilcoxon and Normal Scores Tests" The Annals of Mathematical Statistics, Vol. 34, No. 2, pp. 624-632]

At $n=10$ and $\alpha$ near $0.1$ (exact $\alpha$s aren't achievable of course, unless you go the randomization route, which most people avoid in use, and I think with reason) the relative efficiency to the $t$ at the normal tends to be quite close to the ARE there (0.955), though how close depends (it varies with the mean shift and at smaller $\alpha$, the efficiency will be lower). At smaller sample sizes than 10 the efficiency is generally (a little) higher.

At $n=5$ and $n=6$ (both with $\alpha$ close to 0.05), the efficiency was around 0.97 or higher.

So, broadly speaking ... the ARE at the normal is an underestimate of the relative efficiency in the small sample case, as long as $\alpha$ isn't small. I believe that for a two-tailed test with $n=4$ your smallest achievable $\alpha$ is 0.125. At that exact significance level and sample size, I think the relative efficiency to the $t$ will be similarly high (perhaps still around the 0.97-0.98 or higher) in the area where the power is interesting.

I should probably come back and talk about how to do a simulation, which is relatively straightforward.

Edit:

I've just done a simulation at the 0.125 level (because it's achievable at this sample size); it looks like - across a range of differences in mean, the typical efficiency is a bit lower, for $n=4$, more around 0.95-0.97 or so - similar to the asymptotic value.


Update

Here's a plot of the power (2 sided) for the t-test (computed by power.t.test) in normal samples, and simulated power for the Wilcoxon signed rank test - 40000 simulations per point, with the t-test as a control variate. The uncertainty in the position of the dots is less than a pixel:

power curve for t and power for Wilcoxon


To make this answer more complete I should actually look at the behavior for the case for which the ARE actually is 0.864 (the beta(2,2)).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.