0
$\begingroup$

I have a fairly basic question that I'm looking for a reference for.

First, a couple definitions. Let's say $X_1,\ldots,X_n$ are IID samples from a distribution $F$ over $[0,1]$. For any $k\in\{1,\ldots,n\}$, we can define the $k$th of $n$ order statistic $X_{k:n}$ to be the $k$th highest of the $n$ samples. Let $\mu_{k:n}$ denote the expected value of $X_{k:n}$.

For any given $k$ and $n$, I would like to estimate $\mu_{k:n}$ for $F$. Specifically, I would like find an estimator $\hat \mu_{k:n}$ which takes a profile of $m$ IID samples and minimizes the mean absolute error $ E[|\hat \mu_{k:n}-\mu_{k:n}|]$ in the worst case over all $F$ (again, distributed over $[0,1]$). (With $m$ generally being distinct from and larger than $n$.)

What kinds of guarantees are known for this problem (in terms of $k$, $n$, $m$, and maybe $\mu_{k:n}$)? Is there anything that does significantly better than just dividing $m$ into blocks of $n$ samples and computing an empirical mean of the order statistics?

$\endgroup$
3
  • 1
    $\begingroup$ The pdf of the $k^{th}$ order statistic of a distribution with pdf $f(x)$ and cdf $F(x)$ is $$ f_{(k)}(x)=nf(x)\binom{n-1}{k-1}F(x)^{k-1}\left(1-F(x)\right)^{n-k} $$ whence the expectation can be computed in the usual way. Is this helpful? Or are you looking for something else? $\endgroup$
    – Sycorax
    Commented Oct 21, 2018 at 4:20
  • 1
    $\begingroup$ The $k$-th order statistic is a natural estimate of its expectation. $\endgroup$
    – Xi'an
    Commented Oct 21, 2018 at 4:46
  • $\begingroup$ @Sycorax I'm aware of the formula for order statistics, but I'm not sure how to turn it into an estimator without doing something obvious like taking the empirical CDF and plugging it into the formula for $\mu_{k:n}$. (Which one could presumably analyze with something like DKW, though I'm not sure that's exactly the right tool.) I'm wondering if there's anything better. (And if it wasn't clear before, I'm interested in theoretical guarantees.) $\endgroup$
    – Lemke
    Commented Oct 21, 2018 at 15:29

1 Answer 1

1
$\begingroup$

I would like find an estimator $\hat{μ}_{k:n}$ which takes a profile of $m$ iid samples and minimizes the mean absolute error $\mathbb{E}[|\hat{μ}_{k:n}−μ_{k:n}|]$ in the worst case over all $F$

Minimising an error over all possible distributions is impossible since all distributions include Dirac masses at an arbitrary $a\in (0,1)$ for which the minimiser is $\hat{μ̂}_{k:n}=a$.

If no constraint is imposed on $F$, I think the solution has to rely on an empirical cdf $\hat{F}_m$ based on the sample of size m, from which an estimate of $μ_{k:n}$ can be derived by simulation (i.e., bootstrap in this case).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.