
It is well-known that the asymptotic relative efficiency (ARE) of the Wilcoxon signed rank test is $\frac{3}{\pi} \approx 0.955$ compared to Student's t-test, if the data are drawn from a normally distributed population. This is true for both the basic one-sample test and the variant for two independent samples (the Wilcoxon-Mann-Whitney U). It is also the ARE of a Kruskal-Wallis test compared to an ANOVA F-test, for normal data.

Does this remarkable (for me, one of the "most unexpected appearances of $\pi$") and remarkably simple result have have an insightful, remarkable or simple proof?

  Given the appearance of $\pi$ in the normal cdf, the appearance of $\pi$ in the ARE shouldn't really be all that surprising. I'll hazard an answer but it will take a while to make a good one.
    
    
  • 1
    @Glen_b Indeed - I've seen a "why does $\pi$ appear so much in statistics" discussion before (though can't remember if it was on CV or not) and "because of the normal distribution" I know crops up a lot, but $3/\pi$ is still pleasantly surprising the first time you see it. For comparison the ARE of Mann-Whitney vs two-sample t-test is 3 on exponential data, 1.5 on double exponential and 1 on uniform - much rounder!
    
    
  • 1
    @Silverfish I've linked the page 197 of van der Vaart "Asymptotic Statistics". For one-sample, sign-tests have ARE $2/\pi$ relative to t-test.
    
    
  • 1
    @Silverfish ... and at the logistic it's $(\pi/3)^2$. There are quite a few of the well-known AREs (in either one or two sample cases) involving $\pi$ and quite a few that are simple ratios of integers.
    
    
  • 1
    For one-sample signed rank test, it seems to be $3/\pi$. For one-sample sign test, it is $2/\pi$. So, we clarified our position. I think it is a good sign.
    
    

Brief sketch of ARE for one-sample $t$-test, signed test and the signed-rank test

I expect the long version of @Glen_b's answer includes detailed analysis for two-sample signed rank test along with the intuitive explanation of the ARE. So I'll skip most of the derivation. (one-sample case, you can find the missing details in Lehmann TSH).

Testing Problem: Let $X_1,\ldots,X_n$ be a random sample from location model $f(x-\theta)$, symmetric about zero. We are to compute ARE of signed test, signed rank test for the hypothesis $H_0: \theta=0$ relative to t-test.

To assess the relative efficiency of tests, only local alternatives are considered because consistent tests have power tending to 1 against fixed alternative. Local alternatives that give rise to nontrivial asymptotic power is often of the form $\theta_n=h/\sqrt{n}$ for fixed $h$, which is called Pitman drift in some literature.

Our task ahead is

  • find the limit distribution of each test statistic under the null
  • find the limit distribution of each test statistic under the alternative
  • compute the local asymptotic power of each test

Test statisics and asymptotics

  1. t-test (given the existence of $\sigma$) $$t_n=\sqrt{n}\frac{\bar{X}}{\hat{\sigma}}\to_dN(0,1)\quad \text{under the null}$$ $$t_n=\sqrt{n}\frac{\bar{X}}{\hat{\sigma}}\to_dN(h/\sigma,1)\quad \text{under the alternative }\theta=h/\sqrt{n}$$
    • so the test that rejects if $t_n>z_\alpha$ has asymptotic power function $$1-\Phi\left(z_\alpha-h\frac{1}{\sigma}\right)$$
  2. signed test $S_n=\frac{1}{n}\sum_{i=1}^{n}1\{X_i>0\}$ $$\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(0,\frac{1}{4}\right)\quad \text{under the null }$$ $$\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(hf(0),\frac{1}{4}\right)\quad \text{under the alternative }$$ and has local asymptotic power $$1-\Phi\left(z_\alpha-2hf(0)\right)$$
  3. signed-rank test $$W_n=n^{-2/3}\sum_{i=1}^{n}R_i1\{X_i>0\}\to_dN\left(0,\frac{1}{3}\right)\quad \text{under the null }$$ $$W_n\to_dN\left(2h\int f^2,\frac{1}{3}\right)\quad \text{under the alternative }$$ and has local asymptotic power $$1-\Phi\left(z_\alpha-\sqrt{12}h\int f^2\right)$$

Therefore, $$ARE(S_n)=(2f(0)\sigma)^2$$ $$ARE(W_n)=(\sqrt{12}\int f^2\sigma)^2$$ If $f$ is standard normal density, $ARE(S_n)=2/\pi$, $ARE(W_n)=3/\pi$

If $f$ is uniform on [-1,1], $ARE(S_n)=1/3$, $ARE(W_n)=1/3$

Remark on the derivation of distribution under the alternative

There are of course many ways to derive the limiting distribution under the alternative. One general approach is to use Le Cam's third lemma. Simplified version of it states

Let $\Delta_n$ be the log of the likelihood ratio. For some statistic $W_n$, if $$ (W_n,\Delta_n)\to_d N\left[\left(\begin{array}{c} \mu\\ -\sigma^2/2 \end{array}\right),\left(\begin{array}{cc} \sigma^2_W & \tau \\ \tau & \sigma^2/2 \end{array}\right)\right]\\ $$ under the null, then $$W_n\to_d N\left(\mu+\tau,\sigma^2_W\right)\quad\text{under the alternative}$$

For quadratic mean differentiable densities, local asymptotic normality and contiguity are automatically satisfied, which in turn implies Le Cam lemma. Using this lemma, we only need to compute $\mathrm{cov}(W_n,\Delta_n)$ under the null. $\Delta_n$ obeys LAN $$\Delta_n\approx \frac{h}{\sqrt{n}}\sum_{i=1}^{n}l(X_i)-\frac{1}{2}h^2I_0$$ where $l$ is score function, $I_0$ is information matrix. Then, for instance, for signed test $S_n$ $$\mathrm{cov}(\sqrt{n}(S_n-1/2),\Delta_n)=-h\mathrm{cov}\left(1\{X_i>0\},\frac{f'}{f}(X_i)\right)=h\int_0^\infty f'=hf(0)$$

  +1 I wasn't going to go into quite this much detail (indeed, with your answer covering things quite nicely already, I probably won't add anything to what I have now) so if you want to put more detail, don't hold back on my account. I would have been several days yet (and still for less than you have already), so it's a good thing you came in.
    
    
  This is a nice answer particularly for adding in Le Cam's lemma (+1). It seems to me there is quite a big jump between establishing the asymptotics in 1, 2, and 3, and the "therefore" bit where you write the AREs. I think if I were writing this up, I'd define asymptotic efficiency at this point (or maybe earlier, so the upshot of points 1, 2 and 3 would be the AEs not just local asymptotic powers in each case) and then the step to the AREs would be much easier for future readers to follow.
    
    
  Perhaps it is worth specifying your $H_1$? One-sided and two-sided cases have different-looking asymptotic powers (though they lead to the same AREs).
    
    
  Feel free to edit my answer or append it to the OP.
    
    
  • 1
    @Khashaa Thanks. I shall edit your post when I have the right stuff in front of me. Would you mind clarifying the meaning of the $*$ in the final equation?
    
    

This has nothing to do with explaining why $\pi$ appears (which was explained nicely by others) but may help intuitively. The Wilcoxon test is a $t$-test on the ranks of $Y$ whereas the parametric test is computed on the raw data. The efficiency of the Wilcoxon test with respect to the $t$-test is the square of the correlation between the scores used for the two tests. As $n\rightarrow \infty$ the squared correlation converges to $\frac{\pi}{3}$. You can easily see this empirically using R:

n <- 1000000; x <- qnorm((1:n)/(n+1)); cor(1:n, x)^2; 3/pi
[1] 0.9549402
[1] 0.9549297
n <- 100000000; x <- qnorm((1:n)/(n+1)); cor(1:n, x)^2; 3/pi
[1] 0.9549298
[1] 0.9549297
  This is indeed a very helpful comment. Is it slightly conceptually closer to do n <- 1e6; x <- rnorm(n); cor(x, rank(x))^2 (which obviously produces the same result)?
    
    
  (People intrigued by Frank's comment may want to look at this question about the equivalence of Wilcoxon-Mann-Whitney U and a t-test on the ranks.)
    
    
  something I don't understand about this answer is that the correlation is higher for lower values of $n$ (I think the proximal reason is that we don't see the tails very well for smaller $n$). Naively that implies that the relative efficiency of the Wilcoxon is higher for small $n$, which surprises me ... ?? (I might do some simulations, but (a) if there's an easy answer ... and (b) am I missing a conceptual point somewhere?)
    
    
  To my recollection the small sample efficiency of both the Wilcoxon signed rank test and the W-M-W are a bit lower than the asymptotic value on shift alternatives at the normal distribution.
    
    

Short version: The basic reason with the Wilcoxon-Mann-Whitney under a shift alternative is that finding the asymptotic relative efficiency (WMW/t) corresponds to evaluating $12\sigma^2[\int f^2(x) dx]^2$ where $f$ is the common density at the null and $\sigma$ is the common variance.

So at the normal, $f^2$ is effectively a scaled version of $f$; its integral will have a $\frac{1}{\sqrt{\pi}}$ term; when squared, that's the source of the $\frac{ \;}{\pi}$.

The same term - with the same integral - is involved in the ARE for the signed rank test, so it takes the same value.

For the sign test relative to t, the ARE is $4\sigma^2f(0)^2$... and $f(0)^2$ again has a $\frac{ \;}{\pi}$ in it.

So essentially it's as I said in comments; $\pi$ is in the ARE for the Wilcoxon-Mann-Whitney vs the two-sample t test, for the Wilcoxon signed rank test vs the one-sample t and the sign test vs the one-sample t test (in each case at the normal) quite literally because it appears in the normal density.


J. L. Hodges and E. L. Lehmann (1956),
"The Efficiency of Some Nonparametric Competitors of the t-Test",
Ann. Math. Statist., 27:2, 324-335.

  I like the explanation for the intuition for the appearance of $\pi$ in the denominator; is it essentially coincidence that the Renyi entropy turns up in the WMW/Wilcoxon integrals?
    
    
  @Silverfish That $\int f^2 dx$ turns up is certainly not coincidence. However, that's not because that's connected to Rényi entropy, or at least I don't see any direct connection. We're getting into stuff I don't really know about now, though.
    
    
  @Silverfish It's only a Renyi entropy for $\alpha=2$. Otherwise, it is just a plain old square that can come up in a million different ways.
    
    
  I should perhaps add to my answer that once you attribute it to a normal distribution, from there it's not hard to find a connection to a circle. Often when you find a $\pi$ you can find a circle hidden in there somewhere.
    
    
  More on $\pi$ and circles: 3 Blue 1 Brown discusses some lovely $\pi$-circle connections on their youtube channel, including a way that $\zeta(2)$ fairly naturally comes to relate to a circle.
    
    

