9
$\begingroup$

I am trying to derive the covariance of two sample means and get confused at one point. Given is a sample of size $n$ with paired dependent observations $x_i$ and $y_i$ as realizations of RVs $X$ and $Y$ and sample means $\bar{x}$ and $\bar{y}$. I try to derive $cov(\bar{x},\bar{y})$.

I am relatively sure the result should be

$$cov(\bar{x},\bar{y})=\frac{1}{n}cov(X,Y)$$

However I arrive at

$$cov(\bar{x},\bar{y})=E(\bar{x}\bar{y})-\mu_x\mu_y = E\left(\frac{1}{n^2}\sum x_i \sum y_i\right) -\mu_x\mu_y =\frac{1}{n^2} n^2 E(x_i y_i) -\mu_x\mu_y=cov(X,Y)$$

I used

$$E\left(\frac{1}{n^2}\sum x_i \sum y_i\right)=\frac{1}{n^2} E\left(x_1y_1+x_2y_1+\cdots + x_ny_n\right)=\frac{1}{n^2} n^2 E(x_iy_i)$$

Somewhere should be a flaw in my thinking.

$\endgroup$
5
  • 1
    $\begingroup$ I think your reasoning is essentially correct: stats.stackexchange.com/questions/59546/…, that is, $\mathrm{cov}(\bar{x},\bar{y}) = \mathrm{cov}(X,Y)$ $\endgroup$
    – sandris
    Commented Jul 28, 2015 at 16:53
  • $\begingroup$ So the difference is the assumption about covariances in paired and independent samples. The upper result is that for paired samples, the lower that for independent samples, where $E(x_iy_j)=E(x_i)E(y_j)$ when $i \ne j$ $\endgroup$
    – tomka
    Commented Jul 28, 2015 at 17:00
  • 5
    $\begingroup$ If you are comfortable with deriving the fact that the variance of the sample mean is $1/n$ times the variance, then the result is immediate because covariances are variances. As far as your mistake goes, note that $\text{cov}(x_i,y_j)=0$ for $i\ne j$. It also helps to know that whenever you are working with covariances or variances you may always assume the means are zero, because these are central moments that don't depend on the means at all. $\endgroup$
    – whuber
    Commented Jul 28, 2015 at 17:31
  • $\begingroup$ What I do not yet fully understand is why it holds that $cov(x_i,y_j)=0$ for $i≠j$ when I have paired samples, but it does not hold when I have independent samples (?). Can you explain? $\endgroup$
    – tomka
    Commented Jul 28, 2015 at 19:45
  • 3
    $\begingroup$ Your use of the term "sample" implicitly means $(x_i,y_i)$ is independent of $(x_j,y_j)$ for $i\ne j$. From this it is immediate that their covariances (if they exist) must be zero. $\endgroup$
    – whuber
    Commented Jul 28, 2015 at 20:44

2 Answers 2

14
$\begingroup$

Covariance is a bilinear function meaning that $$ \operatorname{cov}\left(\sum_{i=1}^n a_iC_i, \sum_{j=1}^m b_jD_j\right) = \sum_{i=1}^n \sum_{j=1}^m a_i b_j\operatorname{cov}(C_i,D_j).$$ There is no need to mess with means etc.

Applying this to the question of the covariance of the sample means of $n$ independent paired samples $(X_i, Y_i)$ (note: the pairs are independent bivariate random variables; we are not claiming that $X_i$ and $Y_i$ are independent random variables), we have that \begin{align} \operatorname{cov}\left(\bar{X},\bar{Y}\right) &= \operatorname{cov}\left(\frac{1}{n}\sum_{i=1}^n X_i, \frac 1n\sum_{j=1}^n Y_j\right)\\ &= \frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n \operatorname{cov} (X_i, Y_j)\\ &= \frac{1}{n^2}\sum_{i=1}^n \operatorname{cov} (X_i, Y_i) &\scriptstyle{\text{since $X_i$ and $Y_j$ are independent, and thus uncorrelated, for $i \neq j$}}\\ &= \frac 1n\operatorname{cov} (X, Y) \end{align}


As noted below in a comment by flow2k, although $\operatorname{cov}(\bar{X},\bar{Y})$ is smaller than $\operatorname{cov}({X},{Y})$ by a factor of $n$, the (Pearson) correlation coefficients are the same: $\rho_{\bar{X},\bar{Y}} = \rho_{X,Y}$ !! Previously I had never given the correlation coefficients any thought at all.

$\endgroup$
5
  • $\begingroup$ I think they are $n^2$ terms, but $n(n-1)$ cancle with $\mu_x\mu_y$ due to independence. $\endgroup$
    – tomka
    Commented Jul 28, 2015 at 18:56
  • $\begingroup$ The quoted section above "Covariance is a bilinear function..." - where is this quoted from? $\endgroup$
    – flow2k
    Commented Mar 12, 2023 at 1:12
  • 1
    $\begingroup$ @flow2k The first "quoted" paragraph of my answer is not specifically a quotation in the sense that I wrote it myself without looking at a textbook or paper etc while doing so, but the first sentence (possibly in exactly the same words) can be found in many textbooks. The second sentence of the "quoted" paragraph is proudly my own words; textbook writers (or their copyeditors) and journal paper writers and journal editors don't use such informal language. $\endgroup$ Commented Mar 12, 2023 at 2:54
  • $\begingroup$ Thanks. I think it's interesting the correlation coefficient of the sample means remains unchanged. $\endgroup$
    – flow2k
    Commented Mar 12, 2023 at 9:02
  • $\begingroup$ I had never thought about correlation coefficients at all! I will incorporate this information into my answer (with credit to you). $\endgroup$ Commented Mar 17, 2023 at 3:39
6
$\begingroup$

I think the algebra issue is resolved with the following:

\begin{align}{1 \over n^2}E\left(\sum_{i=1}^n x_i \sum_{i=1}^n y_i\right)&={1 \over n^2}E\left(\sum_{i=1}^n x_i y_i +\sum_{i\ne j}x_i y_j\right)\\&={1 \over n^2}(n(Cov(x_i,y_i)+\mu_X \mu_Y)+n(n-1)\mu_X \mu_Y)\\&={1 \over n^2}(n Cov(x_i,y_i)+n^2 \mu_X \mu_Y))\\&=Cov(x_i,y_i)/n+ \mu_X \mu_Y\end{align}

$\endgroup$
2
  • 2
    $\begingroup$ I think the answer would need to add that the second equation holds due to independence of $x_i$ and $y_j$ for $i \ne j$. $\endgroup$
    – tomka
    Commented Jul 28, 2015 at 18:54
  • $\begingroup$ Yes, that is essential. $\endgroup$
    – JimB
    Commented Jul 28, 2015 at 19:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.