2
$\begingroup$

Let $X_1, X_2,\cdots$ be independent and identically distributed random variables supported on the nonnegative integers with finite mean $\mu$. Denote the $i$th order statistic of the sub-collection $X_1,\cdots, X_n$ by $X_{(i)}^n$ so that $X^n_{(1)} \leq X^n_{(2)} \cdots \leq X^n_{(n)}$ with ties broken in some arbitrary manner.

In a special application to the appearance of giant components in the configuration model from https://arxiv.org/abs/1601.03714, we are needing the lemma:

Given $\epsilon >0$, there exists $\delta >0$ such that \begin{align} \mathbf P \left( \frac 1n \sum_{i=1}^{\lfloor (1- \delta) n \rfloor } X_{(i)}^n \geq (1-\epsilon)\mu \right) \to 1 \text{ as $n \to \infty$. } \end{align}

As the Law of Large Numbers gives $n^{-1} \sum_1^n X^n_{(i)} \to \mu$, the displayed equation is the claim that the sum of a large enough proportion of the order statistics captures an arbitrarily large proportion of the complete sum.

This seems true and holds for continuous random variables; See Theorem 2.2 of Bruss and Robertson. "Wald's Lemma'for sums of order statistics of iid random variables."

However, we are struggling to adapt the proof or find a counterexample for the discrete random variable setting.

$\endgroup$
3
  • $\begingroup$ The fraction of random variables that are larger than $M$ converges with prob 1 to $P[X>M]$. $\endgroup$
    – Michael
    Commented Nov 25, 2021 at 16:50
  • $\begingroup$ For a random variable $X$, either you can find an increasing sequence $\{M_i\}_{i=1}^{\infty}$ such that $P[X>M_i]>0$ for all $i\in\{1, 2, 3, …\}$ and $P[X>M_i]\rightarrow 0$, or there is some value $M$ with $P[X>M]=0$ and $P[X=M]>0$. $\endgroup$
    – Michael
    Commented Nov 25, 2021 at 17:41
  • $\begingroup$ Does this observation immediately give the desired statement? $\endgroup$ Commented Nov 25, 2021 at 18:04

2 Answers 2

1
$\begingroup$

Let $X$ be a random variable with some CDF $F(x)=P[X\leq x]$ for all $x \in \mathbb{R}$.

Claim 1: Either there is an $M \in \mathbb{R}$ such that $P[X>M]=0$ and $P[X=M]>0$, or there is an increasing sequence of real numbers $\{M_i\}_{i=1}^{\infty}$ such that $P[X>M_i]>0$ for all $i \in \{1, 2, 3, ...\}$ and $\lim_{i\rightarrow\infty} P[X>M_i]=0$.

Proof: Define $$ M = \sup\{t \in \mathbb{R}: P[X>t]>0\}$$ If $M=\infty$ then we have $P[X>i]>0$ for all positive integers $i$, and we already know $\lim_{i\rightarrow\infty} P[X>i]=0$ from basic properties of a CDF, so we can define $M_i=i$ and we are done.

If $M$ is finite then we know $P[X>t]=0$ for all $t>M$. Since $\{X>M+1/n\}\nearrow \{X>M\}$, by continuity of probability we have $0=P[X>M+1/n]\nearrow P[X>M]$, and so $P[X>M]=0$. If $P[X=M]>0$ then we are done.

The remaining case is when $M$ is finite and $P[X=M]=0$. We already know $P[X>M]=0$. By definition of $M$ we have $P[X>M-1/i]>0$ for all $i \in \{1, 2, 3, ...\}$. Also $$\{M-1/i<X\leq M\} \searrow \{X=M\}$$ Thus by continuity of probability $$ P[M-1/i<X\leq M]\searrow P[X=M]=0$$ But $$P[X>M-1/i] = P[M-1/i<X\leq M] + P[X>M] = P[M-1/i<X\leq M]$$ and so $P[X>M-1/i]\rightarrow 0$. So we define $M_i = M-1/i$ and we are done. $\Box$

Claim 2: Assume $\{X_i\}_{i=1}^{\infty}$ are i.i.d. nonnegative random variables with CDF $F(x)$ and with finite mean $E[X]$. Then for any $\epsilon>0$ there is a $\delta>0$ such that
$$ P\left[\frac{1}{n}\sum_{i \in I_n(\delta)} X_i \geq E[X] - \epsilon\right]\rightarrow 1$$ where $I_n(\delta)$ is the set of the $\lfloor(1-\delta)n\rfloor$ smallest variables from $\{X_1, ..., X_n\}$.

Proof: Fix $\epsilon>0$. Let $X=X_1$.

Case 1: Suppose there is an increasing sequence $\{M_n\}_{n=1}^{\infty}$ with $P[X>M_n]>0$ and $P[X>M_n]\rightarrow 0$. So we can find a sufficiently large value $M$ (where $M=M_n$ for some $n$) such that $P[X>M]>0$ and $E[X1\{X\leq M\}]\geq E[X]-\epsilon/2$. The fraction of $\{X_i\}$ that are larger than $M$ converges with prob 1 to $P[X>M]$. Define $\delta = P[X>M]/2$. Then for sufficiently large $n$, the largest $\delta$ fraction of the $\{X_1, ..., X_n\}$ values are all larger than $M$, so $$ \sum_{i \in I_n(\delta)} X_i \geq \sum_{i=1}^n X_i1\{X_i\leq M\}$$ So $$ \frac{1}{n}\sum_{i \in I_n(\delta)} X_i \geq \frac{1}{n}\sum_{i=1}^n X_i1\{X_i\leq M\}$$ The right-hand-side converges to $E[X1\{X\leq M\}]$ with prob 1, so with some additional steps we are done. $\Box$

Case 2: There is an $M$ such that $P[X=M]>0$ and $P[X>M]=0$.

In this case, with $\delta \leq P[X=M]/2$, the largest $\delta$ fraction of $\{X_1, ..., X_n\}$ will all eventually take the value $M$ itself. I believe with some additional steps we are done.

$\endgroup$
2
  • $\begingroup$ Obviously this proof is not complete but I think I gave my main ideas. I need to go now. Maybe I will try to complete this at some point. $\endgroup$
    – Michael
    Commented Nov 25, 2021 at 18:40
  • $\begingroup$ Thanks! I'll take a closer look and post a complete answer if I arrive at one. $\endgroup$ Commented Nov 25, 2021 at 18:43
0
$\begingroup$

Throughout the argument we let $n_\delta = \lfloor (1- \delta) n \rfloor$.

First we assume that $\mathbb P(X_1 > x) >0$ for all $x\geq 0$ so that $X_1$ has unbounded positive support. Set $$M := \min \{ x \colon \mathbb E [X_1 \mathbf 1\{X_1 \leq x\}] \geq (1- \epsilon/2) \mu \},$$ which exists by the hypothesis that $\mathbb E X_1 = \mu$. Let $0< p := P(X_1 > M)$. The law of large numbers ensures that $$\frac 1n \sum_1^n \mathbf1\{X_i \leq M\} \leq 1- \frac p 2 >0 \text{ and } \frac 1n \sum_1^n X_i \mathbf 1\{X_i \leq M\} \geq (1- \epsilon ) \mu $$ with high probability. Setting $\delta = p/2$, we then have \begin{align} \frac 1n \sum_{i =1}^{n_\delta} X_{(i)}^n \geq \frac 1n \sum_{i=1}^n X_i \mathbf\{ X_i \leq M\} \geq (1- \epsilon) \mu \label{eq:ub} \end{align} with high probability. This is the claimed relation for the unbounded case.

Next, suppose that $M = \min \{ x \colon \mathbb P(X_1 \leq M) = 1\}$ exists for some finite $M$ so that the positive support of $X_1$ is bounded. Let $p = \mathbb P(X_1 = M) >0$. Note that if $p=1$, then the $X_i$ are deterministic and the desired claim is trivial. So, suppose that $0<p<1$ and set $\delta = \epsilon p \mu / M$. Since the mean $\mu$ of $X_1$ is bounded by the maximum value $M$ in the support of $X_1$ we have $\mu /M \leq 1$. This, along with the assumption that $\epsilon <1$, implies that $\delta < p$. Thus, the law of large numbers ensures that $X_{(i)}^n = M$ for all $i > n_\delta$ with high probability. It follows that $$\frac 1n \sum_{i=n_\delta +1}^{n} X_{(i)}^n = \frac 1 n (n - (n_\delta +1) ) M \leq \delta M = \epsilon p \mu \qquad (*) $$ with high probability. The claimed bound on $\sum_1^{n_\delta} X_{(i)}^n$ follows from combining $(*)$ with the following consequence of the law of large numbers \begin{align} \frac 1n \sum_{i=1}^{n_\delta} X_{(i)}^n + \frac 1n \sum_{i =n_\delta+1}^n X_{(i)}^n = \frac 1n \sum_{i=1}^n X_i \geq (1- \epsilon p) \mu \label{eq:end2} \end{align} with high probability.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .