7
$\begingroup$

The problem is extracted from All of Statistics (Exercise 7.5), Larry Wasserman. I don't have a solution manual to the book so I post here the problem together with my attempted answer:

Let $x$ and $y$ be two distinct points. Find $Cov(\hat F_n(x), \hat F_n(y))$.

Here is my attempted answer:

$\hat F_n(x) = \frac{1}{n} \sum I\{ X_i \le x\} $

$\mathop{\mathbb{E}}(\hat F_n(x)) = F(x) $

$\mathop{\mathbb{E}}(\hat F_n(y)) = F(y) $

$Cov(\hat F_n(x), \hat F_n(y)) = \mathop{\mathbb{E}}(\hat F_n(x)\cdot \hat F_n(y)) - \mathop{\mathbb{E}}(F_n(X))\mathop{\mathbb{E}}(F_n(y) $

For (Updated based on Xi'an's answer) \begin{align*} \mathop{\mathbb{E}}(\hat F_n(x)\cdot \hat F_n(y)) &= \frac{1}{n^2} \mathop{\mathbb{E}}(\sum_i I\{X_i \le x\} \sum_j I\{X_j \le y\}) \\&= \frac{1}{n^2} \mathop{\mathbb{E}}(\sum_{i \neq j} I\{X_i \le x\}I\{X_j \le y\} + \sum_{i = j} I\{X_i \le x\}I\{X_j \le y\}) \\&= \frac{1}{n^2}(nF(\min\{x,y\})+n(n-1)F(x)F(y)) \\&= \frac{1}{n}(F(\min\{x,y\}) + (n-1)F(x)F(y))\end{align*}

Combining the above result together, we have (Updated based on Xi'an's answer):

$$ Cov(\hat F_n(x), \hat F_n(y)) = \frac{1}{n}(F(\min\{x,y\}) - F(x)F(y)) $$

I am not sure if my attempt is correct or not. Could anyone verify the answer or point out if there are any flaws in my arguments?

$\endgroup$

1 Answer 1

6
$\begingroup$

Note that \begin{align*}\text{Cov}(\frac{1}{n} \sum I\{ X_i \le x\},\frac{1}{n} \sum I\{ X_i \le y\}) &=\frac{1}{n^2}\text{Cov}(\sum I\{ X_i \le x\},\sum I\{ X_i \le y\})\\ &=\frac{1}{n^2}\sum_{i=1}^n\text{Cov}(I\{ X_i \le x\},I\{ X_i \le y\})\\ &\qquad\quad{\text{(since the $X_i$'s are independent)}}\\ &=\frac{1}{n}\text{Cov}(I\{ X_1 \le x\},I\{ X_1 \le y\})\end{align*} and \begin{align*} \mathop{\mathbb{E}}[I\{X_1 \le x\}I\{X_1 \le y\}] &= \mathop{\mathbb{E}}[I(X_1 \le \min\{x, y\})] \\&= F(\min\{x,y\}) \end{align*} leading to $$\text{Cov}(\hat F_n(x), \hat F_n(y)) = \frac{1}{n}[F(\min\{x,y\}) - F(x)F(y)]\tag{1}$$ When writing \begin{align*} \mathop{\mathbb{E}}(\hat F_n(x)\cdot \hat F_n(y)) &= \frac{1}{n^2} \mathop{\mathbb{E}}(\sum_i I\{X_i \le x\} \sum_j I\{X_j \le y\}) \\&= \frac{1}{n^2} \mathop{\mathbb{E}}(\underbrace{\sum_{i \neq j}}_{n(n-1)\\\text{distinct}\\\text{pairs}} I\{X_i \le x\}I\{X_j \le y\} + \sum_{i = j} I\{X_i \le x\}I\{X_j \le y\}) \\&\overbrace{=}^\text{wrong!} \frac{1}{n^2}(nF(\min\{x,y\})+nF(x)F(y))\end{align*} the mistake is in not counting the number of distinct pairs $i\ne j$ right: there are $n(n-1)$ of them, rather than $n$. With this correction, $$\mathop{\mathbb{E}}(\hat F_n(x)\cdot \hat F_n(y))=\frac{F(\min\{x,y\})}{n}+\frac{n-1}{n}F(x)F(y)-F(x)F(y)=\frac{1}{n}[F(\min\{x,y\}) - F(x)F(y)]$$ recovering (1). Here is an illustration of the fit between theory and empirical evaluation of $\text{Cov}(I\{ X_1 \le x\},I\{ X_1 \le y\})$

x=rnorm(1e6)
cov((x<a),(x<b))
pnorm(min(c(a,b)))-pnorm(a)*pnorm(b))

based on 10³ random pairs (a,b).

enter image description here

$\endgroup$
2
  • 1
    $\begingroup$ Thanks for pointing it out. Based on your correction, I have edited my argument $\endgroup$
    – yalex314
    Commented Jan 4, 2019 at 15:50
  • $\begingroup$ Your solution implies $\mathrm{Cov}(\hat{F}_n(x),\hat{F}_n(y)) = \mathbb{E}(\hat{F}_n(x)\hat{F}_n(y))$. Is that correct? Did you mean $\mathrm{Cov}(\hat{F}_n(x),\hat{F}_n(y))$ instead of $ \mathbb{E}(\hat{F}_n(x)\hat{F}_n(y))$ in the equation following "With this correction,"? $\endgroup$ Commented Feb 5, 2021 at 18:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.