7
$\begingroup$

Here's a problem that seems rather peculiar to me. This time I have no initial idea about how to solve it.

Let $X_1$, ..., $X_n$ be independent, real valued random variables with density $f$ and CDF $F$. Let $F_i$ denote the CDF of $X_{(i)}$.

a) What is the distribution of $F(X_{(i)})$ and $F_i(X_{(i)})$?

b) What is the variance of $F(X_{(i)})$?

[edit: I have realised by now that my approach was flawed and furthermore that my question needs clarification. So here is my second try.]

As I understand the question, we have real valued, i.i.d. random variables $X_1$, ..., $X_n$ with density $f$ and CDF $F$.

$X_{(1)}< ... < X_{(n)}$ are the corresponding order statistics with CDF $F_i:=F_{X_{(i)}}$.

I know the general formula for the CDF of order statistics. It is given by $$F_i(t)=\sum_{k=i}^n \binom{n}{k}F(t)^k(1-F(t))^{n-k}.$$

Now, the CDF of a random variable is a measurable function. Thus $F(X_{(i)})$ and $F_i(X_{(i)})$ are real valued random variables again. And a) asks for their distribution.

Per definition, we have $$F(t)=\mathbb{P}(X_i\leq t).$$ Hence $$F(X_{(i)})=\mathbb{P}(X_i\leq X_{(i)}).$$ This is the probability of the event that any $X_i$ is less or equal $X_{(i)}$. By definition of order statistics, we have $$F(X_{(n)})=\mathbb{P}(X_i\leq X_{(n)})=1,$$ as $X_{(n)}$ is the maximum of the $X_i$. But how do I derive the distribution of $F(X_{(i)})$ for $i \in \{1, ..., n-1\}$?

I think, if they were uniformly distributed, the answer would simply be $$F(X_{(i)})=i/n.$$ But their distribution ist unknown, so I'm stuck and have no idea how to proceed from here.

$\endgroup$

2 Answers 2

8
$\begingroup$

First recall the most important result in order statistics:

For every random variable $Z$ with continuous CDF $H$, $H(Z)$ is uniform on $(0,1)$.

Thus, if $(X_i)_{1\leqslant i\leqslant n}$ has continuous CDF and $(U_i)_{1\leqslant i\leqslant n}$ is an i.i.d. sample uniform on $(0,1)$, then, for every $1\leqslant i\leqslant n$ and every $x$ in $(0,1)$, $$ P(F_i(X_{(i)})\leqslant x)=P(U_1\leqslant x)=x. $$ And $(F(X_i))_{1\leqslant i\leqslant n}$ is distributed as $(U_i)_{1\leqslant i\leqslant n}$, thus, the CDF $G_i$ of $F(X_{(i)})$ is such that, for every $x$ in $(0,1)$, $$ G_i(x)=P(F(X_{(i)})\leqslant x)=P(U_{(i)}\leqslant x)=\sum_{k=i}^n{n\choose k}t^k(1-t)^{n-k}. $$ Recall a most useful result to compute expectations:

For every $(0,1)$-valued random variable $Z$ with CDF $H$, $E[Z]=\displaystyle\int_0^1(1-H)=1-\int_0^1H$.

Hence, $$ E[U_{(i)}]=1-\sum_{k=i}^n{n\choose k}\int_0^1t^k(1-t)^{n-k}\mathrm dt. $$ Recall now that:

For every $k\leqslant n$, $\displaystyle\int_0^1t^k(1-t)^{n-k}\mathrm dt=\frac1{n+1}{n\choose k}^{-1}$.

Hence, $$ E[U_{(i)}]=1-\sum_{k=i}^n\frac1{n+1}=\frac{i}{n+1}. $$ Recall finally the analogue for second moments of our result for expectations:

For every $(0,1)$-valued random variable $Z$ with CDF $H$, $\displaystyle E[Z^2]=\int_0^12x(1-H(x))\mathrm dx$, that is, $\displaystyle E[Z^2]=1-2\int_0^1xH(x)\mathrm dx.$

Hence, $$ E[U_{(i)}^2]=1-2\sum_{k=i}^n{n\choose k}\int_0^1t^{k+1}(1-t)^{n-k}\mathrm dt=1-2\sum_{k=i}^n{n\choose k}\frac1{n+2}{n+1\choose k+1}^{-1}, $$ that is, $$ E[U_{(i)}^2]=1-\frac2{(n+2)(n+1)}\sum_{k=i}^n(k+1)=\frac{i(i+1)}{(n+1)(n+2)}, $$ from which you can probably guess an expression of $E[U_{(i)}^k]$ valid for every nonnegative integer $k$, and from which, independently, one deduces that $$ \mathrm{var}(F(X_{(i)}))=\mathrm{var}(U_{(i)})=\frac{i(i+1)}{(n+1)(n+2)}-\left(\frac{i}{n+1}\right)^2=\frac{i(n+1-i)}{(n+1)^2(n+2)}. $$

$\endgroup$
1
  • $\begingroup$ Do you have in mind any reference where the last colored comment appears (the one related with second moments) $\endgroup$ Commented Apr 5, 2019 at 0:36
2
$\begingroup$

Hints

  1. If $X_{(i)}$ is the $i^{\text{th}}$ order statistic then it must be the case that $X_j \ge X_{(i)}$ for at most $i$ of the original random variables $X_j$.

  2. There are ${n}\choose{i}$ possible ways of selecting which random variables obey the ordering in point 1 above.

Hopefully, the above two hints will get you started down the right track.

$\endgroup$
3
  • $\begingroup$ A few points: 1. You want the CDF of $X_{(i)}$ which by definition is the $\text{i}^{th}$ order statistic, right? 2. Your question in the post: 'What is the distribution of $F(.)$..?' do not make any sense. $F(.)$ by itself is a CDF. 3. I am not sure what you mean by $F(X_{(i)})$ and $F_i(X_{(i)})$. What is the difference between the two functions and who do the random variables appear as arguments to these CDFs? I wrote my hints assuming that point 1 is true. $\endgroup$
    – response
    Commented May 29, 2013 at 13:40
  • $\begingroup$ Woops, sorry, deleted my other comment, instead of editing it. Ad 1.: $X_{(i)}$ is the $i$-th order statitic, right. I know the CDF of $X_{(i)}$. It is given by the formula $$F_{X_{(i)}}(t):=F_i(t)=i\binom{n}{i}F(t)^{i-1}[1-F(t)]^{n-i}f(t).$$ Ad 2.: Why does it not make sense? $F:=F_X$ is the CDF of $X_i$ and thus a measurable function, as well as $F_i:=F_{X_{(i)}}$ is. Hence $F(X_{(i)})$ and $F_i(X_{(i)})$ are again random variables and I want to determine their distribution. Ad 3.: I'm not entirely sure either and that's part of my problem here.^^ I will edit my question and try to clarify. $\endgroup$
    – Amarus
    Commented May 29, 2013 at 14:03
  • $\begingroup$ Damn, another mistake: I consued the CDF of the order statistics with their PDF... -_- I wil ledit the original question. $\endgroup$
    – Amarus
    Commented May 29, 2013 at 18:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .