53
$\begingroup$

Are there well known formulas for the order statistics of certain random distributions? Particularly the first and last order statistics of a normal random variable, but a more general answer would also be appreciated.

Edit: To clarify, I am looking for approximating formulas that can be more-or-less explicitly evaluated, not the exact integral expression.

For example, I have seen the following two approximations for the first order statistic (ie the minimum) of a normal rv:

$e_{1:n} \geq \mu - \frac{n-1}{\sqrt{2n-1}}\sigma$

and

$e_{1:n} \approx \mu + \Phi^{-1} \left( \frac{1}{n+1} \right)\sigma$

The first of these, for $n=200$, gives approximately $e_{1:200} \geq \mu - 10\sigma$ which seems like a wildly loose bound.

The second gives $e_{1:200} \approx \mu - 2.58\sigma$ whereas a quick Monte Carlo gives $e_{1:200} \approx \mu - 2.75\sigma$, so it's not a bad approximation but not great either, and more importantly I don't have any intuition about where it comes from.

Any help?

$\endgroup$
2
  • 4
    $\begingroup$ If you use R, see the ppoints function. $\endgroup$
    – cardinal
    Commented Mar 31, 2011 at 13:41
  • 1
    $\begingroup$ @probabilityislogic has given some good intuition for the approximations you list. Would it be helpful at all if I gave some more from an alternative viewpoint, or have you satisfied your curiosity on this matter? $\endgroup$
    – cardinal
    Commented Apr 1, 2011 at 19:59

4 Answers 4

40
$\begingroup$

The classic reference is Royston (1982)[1] which has algorithms going beyond explicit formulas. It also quotes a well-known formula by Blom (1958): $E(r:n) \approx \mu + \Phi^{-1}(\frac{r-\alpha}{n-2\alpha+1})\sigma$ with $\alpha=0.375$. This formula gives a multiplier of -2.73 for $n=200, r=1$.

[1]: Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) J. P. Royston. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 2 (1982), pp. 161-165

$\endgroup$
1
  • $\begingroup$ Thanks for the reference and the answer! I was wondering whether there's any such result on the (possibly normal) approximation of the CDF's/PDF's of these order statistics, and secondly a decent approximation for the ratio of the last to first order statistics. You can assume that the samples are from a normal distribution, just like the OP assumed. $\endgroup$
    – Mathmath
    Commented Aug 21, 2020 at 10:43
29
$\begingroup$

$$\newcommand{\Pr}{\mathrm{Pr}}\newcommand{\Beta}{\mathrm{Beta}}\newcommand{\Var}{\mathrm{Var}}$$The distribution of the ith order statistic of any continuous random variable with a PDF is given by the "beta-F" compound distribution. The intuitive way to think about this distribution, is to consider the ith order statistic in a sample of $N$. Now in order for the value of the ith order statistic of a random variable $X$ to be equal to $x$ we need 3 conditions:

  1. $i-1$ values below $x$, this has probability $F_{X}(x)$ for each observation, where $F_X(x)=\Pr(X<x)$ is the CDF of the random variable X.
  2. $N-i$ values above $x$, this has probability $1-F_{X}(x)$
  3. 1 value inside a infinitesimal interval containing $x$, this has probability $f_{X}(x)dx$ where $f_{X}(x)dx=dF_{X}(x)=\Pr(x<X<x+dx)$ is the PDF of the random variable $X$

There are ${N \choose 1}{N-1 \choose i-1}$ ways to make this choice, so we have:

$$f_{i}(x_{i})=\frac{N!}{(i-1)!(N-i)!}f_{X}(x_{i})\left[1-F_{X}(x_{i})\right]^{N-i}\left[F_{X}(x_{i})\right]^{i-1}dx$$

EDIT in my original post, I made a very poor attempt at going further from this point, and the comments below reflect this. I have sought to rectify this below

If we take the mean value of this pdf we get:

$$E(X_{i})=\int_{-\infty}^{\infty} x_{i}f_{i}(x_{i})dx_{i}$$

And in this integral, we make the following change of variable $p_{i}=F_{X}(x_{i})$ (taking @henry's hint), and the integral becomes:

$$E(X_{i})=\int_{0}^{1} F_{X}^{-1}(p_{i})\Beta(p_{i}|i,N-i+1)dp_{i}=E_{\Beta(p_{i}|i,N-i+1)}\left[F_{X}^{-1}(p_{i})\right]$$

So this is the expected value of the inverse CDF, which can be well approximated using the delta method to give:

$$E_{\Beta(p_{i}|i,N-i+1)}\left[F_{X}^{-1}(p_{i})\right]\approx F_{X}^{-1}\left[E_{\Beta(p_{i}|i,N-i+1)}\right]=F_{X}^{-1}\left[\frac{i}{N+1}\right]$$

To make a better approximation, we can expand to 2nd order (prime denoting differentiation), and noting that the second derivative of an inverse is:

$$\frac{\partial^{2}}{\partial a^{2}}F_{X}^{-1}(a)=-\frac{F_{X}^{''}(F_{X}^{-1}(a))}{\left[F_{X}^{'}(F_{X}^{-1}(a))\right]^{3}}=-\frac{f_{X}^{'}(F_{X}^{-1}(a))}{\left[f_{X}(F_{X}^{-1}(a))\right]^{3}}$$

Let $\nu_{i}=F_{X}^{-1}\left[\frac{i}{N+1}\right]$. Then We have:

$$E_{\Beta(p_{i}|i,N-i+1)}\left[F_{X}^{-1}(p_{i})\right]\approx F_{X}^{-1}\left[\nu_{i}\right]-\frac{\Var_{\Beta(p_{i}|i,N-i+1)}\left[p_{i}\right]}{2}\frac{f_{X}^{'}(\nu_{i})}{\left[f_{X}(\nu_{i})\right]^{3}}$$ $$=\nu_{i}-\frac{\left(\frac{i}{N+1}\right)\left(1-\frac{i}{N+1}\right)}{2(N+2)}\frac{f_{X}^{'}(\nu_{i})}{\left[f_{X}(\nu_{i})\right]^{3}}$$

Now, specialising to normal case we have $$f_{X}(x)=\frac{1}{\sigma}\phi(\frac{x-\mu}{\sigma})\rightarrow f_{X}^{'}(x)=-\frac{x-\mu}{\sigma^{3}}\phi(\frac{x-\mu}{\sigma})=-\frac{x-\mu}{\sigma^{2}}f_{X}(x)$$ $$F_{X}(x)=\Phi(\frac{x-\mu}{\sigma})\implies F_{X}^{-1}(x)=\mu+\sigma\Phi^{-1}(x)$$

Note that $f_{X}(\nu_{i})=\frac{1}{\sigma}\phi\left[\Phi^{-1}\left(\frac{i}{N+1}\right)\right]$ And the expectation approximately becomes:

$$E[x_{i}]\approx \mu+\sigma\Phi^{-1}\left(\frac{i}{N+1}\right)+\frac{\left(\frac{i}{N+1}\right)\left(1-\frac{i}{N+1}\right)}{2(N+2)}\frac{\sigma\Phi^{-1}\left(\frac{i}{N+1}\right)}{\left[\phi\left[\Phi^{-1}\left(\frac{i}{N+1}\right)\right]\right]^{2}}$$

And finally:

$$E[x_{i}]\approx \mu+\sigma\Phi^{-1}\left(\frac{i}{N+1}\right)\left[1+\frac{\left(\frac{i}{N+1}\right)\left(1-\frac{i}{N+1}\right)}{2(N+2)\left[\phi\left[\Phi^{-1}\left(\frac{i}{N+1}\right)\right]\right]^{2}}\right]$$

Although as @whuber has noted, this will not be accurate in the tails. In fact I think it may be worse, because of the skewness of a beta with different parameters

$\endgroup$
13
  • 1
    $\begingroup$ "Maximum likelihood estimator of a random variable"? Not sure what that is, but I think you've (almost) calculated the mode. $\endgroup$
    – cardinal
    Commented Mar 31, 2011 at 14:39
  • 1
    $\begingroup$ Something mysterious happens about two-thirds of the way through when suddenly $\mu$ and $\sigma$ appear without warning or definition. $\endgroup$
    – whuber
    Commented Mar 31, 2011 at 15:17
  • 2
    $\begingroup$ I don't mean to "pile on", but it's also hard for me to see how the quantity in brackets can be approximated by a negative number. $\endgroup$
    – cardinal
    Commented Mar 31, 2011 at 15:34
  • 1
    $\begingroup$ @probabilityislogic, while at the level of calculus, you might say that in this case we're considering a bivariate function and simply maximizing over one variable instead of another, I think there are reasons mathematical, statistical, and pedagogical not to call what you've done "maximum likelihood estimation". They are too numerous to enumerate in this space, but a simple one that I think is compelling enough is that we use a particular, arcane vocabulary in statistics for a reason. Changing that on a whim for a single problem can lead to misunderstanding(s).../... $\endgroup$
    – cardinal
    Commented Apr 1, 2011 at 11:52
  • 2
    $\begingroup$ @probabilityislogic (+1) for the revised answer. One suggestion, maybe $\Rightarrow$ is better than $\to$ to mean "implies". It took staring at a couple lines for a few seconds to realize you weren't making some convergence claim. $\endgroup$
    – cardinal
    Commented Apr 1, 2011 at 20:25
16
$\begingroup$

Aniko's answer relies on Blom's well known formula that involves a choice of $\alpha = 3/8$. It turns out that this formula is itself a mere approximation of an exact answer due to G. Elfving (1947), The asymptotical distribution of range in samples from a normal population, Biometrika, Vol. 34, pp. 111-119. Elfving's formula is aimed at the minimum and maximum of the sample, for which the correct choice of alpha is $\pi/8$. Blom's formula results when we approximate $\pi$ by $3$.

By using the Elfving formula rather than Blom's approximation, we get a multiplier of -2.744165. This number is closer to Erik P.'s exact answer (-2.746) and to the Monte Carlo approximation (-2.75) than is Blom's approximation (-2.73), while being easier to implement than the exact formula.

$\endgroup$
4
  • 1
    $\begingroup$ Could you provide a bit more detail as to how $\alpha=\pi/8$ is arrived at through Elfving (1947)? It's not obvious in the article. $\endgroup$
    – Anthony
    Commented May 18, 2015 at 14:12
  • 2
    $\begingroup$ Anthony - I am relying on the textbook Mathematical Statistics, by Samuel Wilks, pub. Wiley (1962). Exercise 8.21 on p. 249 states: "If x_(1), x_(n) are the smallest and largest order statistics of a sample of size n from a continuous c.d.f. F(x)...the random variable 2n*sqrt{[F(x_(1))][1-F(x_(n))]} has a limit distribution as n -> infinity, with mean pi/2 and variance 4-(pi^2)/4." (Sorry I don't know markup code!) For a symmetric distribution, F(x_(1)) = 1-F(x_(n)). Thus F(x_(n)) is about pi/(4n), or x_(n) is about F^(-1)(pi/(4n)). The Blom formula uses the approximation 3/(4n). $\endgroup$ Commented May 18, 2015 at 15:45
  • $\begingroup$ This reminds me of the Infamous "$\pi=3$" bill attributed to the Indiana State Legislature. (Though the wikipedia article suggests that the popular version of the story is not accurate.) $\endgroup$ Commented Oct 2, 2019 at 22:27
  • $\begingroup$ @HalM.Switkay. : $\endgroup$
    – MSIS
    Commented Jun 20, 2022 at 23:33
11
$\begingroup$

Depending on what you want to do, this answer may or may not help - I got the following exact formula from Maple's Statistics package.

with(Statistics):
X := OrderStatistic(Normal(0, 1), 1, n):
m := Mean(X):
m;

$$\int _{-\infty }^{\infty }\!1/2\,{\frac {{\it \_t0}\,n!\,\sqrt {2}{ {\rm e}^{-1/2\,{{\it \_t0}}^{2}}} \left( 1/2-1/2\, {{\rm erf}\left(1/2\,{\it \_t0}\,\sqrt {2}\right)} \right) ^{-1+n}}{ \left( -1+n \right) !\,\sqrt {\pi }}}{d{\it \_t0}}$$

By itself this isn't very useful (and it could probably be derived fairly easily by hand, since it's the minimum of $n$ random variables), but it does allow for quick and very accurate approximation for given values of $n $ - much more accurate than Monte Carlo:

evalf(eval(m, n = 200));
evalf[25](eval(m, n = 200));

gives -2.746042447 and -2.746042447451154492412344, respectively.

(Full disclosure - I maintain this package.)

$\endgroup$
1
  • 1
    $\begingroup$ @ProbabilityIsLogic derived this integral for all order statistics in the first half of his reply. $\endgroup$
    – whuber
    Commented Mar 31, 2011 at 19:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.