4
$\begingroup$

Suppose

  • $X_i, i=1,\ldots, n$ are $i.i.d.$ random variables with mean $\mu_X$ and variance $\sigma^2_X$
  • $Y_j, j=1,\ldots, m$ are $i.i.d.$ random variables with mean $\mu_Y$ and variance $\sigma^2_Y$
  • $\forall i, j, X_i\perp \!\!\! \perp Y_j$

then by the CLT we have

  • $\displaystyle\frac{\frac{\sum_{i=1}^nX_i}{n}-\mu_X}{\sqrt{\frac{\sigma^2_X}{n}}}\sim \mathcal{N}(0,1)$ as $n\rightarrow \infty$
  • $\displaystyle \frac{\frac{\sum_{j=1}^mY_j}{m}-\mu_Y}{\sqrt{\frac{\sigma^2_Y}{m}}}\sim \mathcal{N}(0,1)$ as $m\rightarrow \infty$

My question is, do we also have a similar result for the difference in sample means? $$\displaystyle \frac{\left(\frac{\sum_{i=1}^nX_i}{n}-\frac{\sum_{j=1}^mY_j}{m}\right)-(\mu_X-\mu_Y)}{\sqrt{\frac{\sigma^2_X}{n}+\frac{\sigma^2_Y}{m}}}\sim \mathcal{N}(0,1) \text{ as }\bigstar\rightarrow\infty$$ If such a result holds, what would be $\bigstar$? Is it $(n, m)$, $n+m$, $n\times m$, or something else? And how can one prove this result?

My intuition is as follows. In the above CLTs, if one is allowed to "sloppily" perform the following manipulations:

  • $\displaystyle{\frac{\sum_{i=1}^nX_i}{n}-\mu_X}\sim \mathcal{N}\left(0,{{\frac{\sigma^2_X}{n}}}\right)$ as $n\rightarrow \infty$
  • $\displaystyle {\frac{\sum_{j=1}^mY_j}{m}-\mu_Y}\sim \mathcal{N}\left(0,{{\frac{\sigma^2_Y}{m}}}\right)$ as $m\rightarrow \infty$

then some sort of continuous mapping argument can perhaps be used to yield $$\displaystyle \displaystyle{\frac{\sum_{i=1}^nX_i}{n}-\mu_X}+{\frac{\sum_{j=1}^mY_j}{m}-\mu_Y}\sim \mathcal{N}\left(0,{{\frac{\sigma^2_X}{n}}}+{{\frac{\sigma^2_Y}{m}}}\right) \text{ as } \bigstar\rightarrow \infty$$

after which one may again use the sloppy manipulation to put the variance term back into the denominator in the LHS.

But I suppose this argument is not valid, right?

Edit: I forgot to mention that, I guess I know (and also thanks to @periwinkle answer below) that $$\frac{\frac{\sum_{i=1}^nX_i}{n}-\mu_X}{\sqrt{\frac{\sigma^2_X}{n}}}-\frac{\frac{\sum_{j=1}^mY_j}{m}-\mu_Y}{\sqrt{\frac{\sigma^2_Y}{m}}}\sim \mathcal{N}(0,2) \text{ as }n, m\rightarrow\infty$$ This result, however, is not quite the same as what I intended to ask. So, is my original statement simply wrong? Is it valid to use some sorts of normal approximation directly on the difference in sample means?

$\endgroup$

1 Answer 1

3
$\begingroup$

The difference of the sample means will indeed converge towards a normal distribution but with variance $2$ instead of $1$.

To show this I think using characteristic functions may be helpful in this setting and in particular the Lévy's convergence theorem.

For simpler notation let

$$ {\bf X}_n = \frac{\frac{\sum_{i=1}^nX_i}{n}-\mu_X}{\sqrt{\frac{\sigma^2_X}{n}}} $$

and

$$ {\bf Y}_m = \frac{\frac{\sum_{j=1}^mY_j}{m}-\mu_Y}{\sqrt{\frac{\sigma^2_Y}{m}}}. $$

By independence of $(X_i)$ and $(Y_j)$, we have ${\bf X}_n \perp \!\!\! \perp {\bf Y}_m$.

The characteristic function of ${\bf X}_n - {\bf Y}_m$ is, by definition,

\begin{align*} \phi_{{\bf X}_n - {\bf Y}_m} (t) &= \mathbb E\left[ e^{it({\bf X}_n - {\bf Y}_m )}\right] \\ &=\mathbb E \left[ e^{it{\bf X}_n} \right ] \mathbb E \left[e^{-it {\bf Y}_m}\right] \ \ \ \text{(by independence)}. \end{align*} The characteristic function of a normal distribution with mean $\mu$ and variance $\sigma^2$ is $\phi_{\mu,\sigma^2}(t) = e^{it\mu}e^{-\frac{\sigma^2 t^2}{2}}$.

By convergence in distribution of ${\bf X}_n$ and ${\bf Y}_m$ towards a $\mathcal{N}(0,1)$ we have

\begin{align*} \phi_{{\bf X}_n - {\bf Y}_m} (t) &\to \left ( e^{-\frac{t^2}{2}} \right)^2 \\ &\to e^{-t^2}. \end{align*}

The "$\to$" above states that the convergence takes place when both $n$ and $m$ converge to $+\infty$.

Since $e^{-t^2}$ is the characteristic function of a $\mathcal{N}(0,2)$, we have ${\bf X}_n - {\bf Y}_m \rightsquigarrow \mathcal{N}(0,2)$.


For the variable

$$ S=\frac{\frac{1}{n} \sum_{i=1}^n X_i - \frac{1}{m} \sum_{j=1}^m Y_j - (\mu_X - \mu_Y)}{\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}}} $$

we can reexpress it as:

$$ S= \frac{\sqrt{\frac{\sigma_X^2}{n}}}{\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}}} {\bf X}_n - \frac{\sqrt{\frac{\sigma_Y^2}{m}}}{\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}}} {\bf Y}_m $$

Since ${\bf X}_n$ and ${\bf Y}_m$ $\to \mathcal{N}(0,1)$ by using the argument above we have $S \to \mathcal{N}(0,\sigma^2)$ where

\begin{align*} \sigma^2 &=\lim_{n,m \to \infty} \left\{ \operatorname{Var} \left( \frac{\sqrt{\frac{\sigma_X^2}{n}}}{\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}}} {\bf X}_n \right ) + \operatorname{Var} \left( \frac{\sqrt{\frac{\sigma_Y^2}{m}}}{\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}}} {\bf Y}_m \right ) \right \} \\ &=\lim_{n,m \to \infty} \left \{ \frac{\frac{\sigma_X^2}{n}}{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} \operatorname{Var} ({\bf X}_n) + \frac{\frac{\sigma_Y^2}{m}}{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} \operatorname{Var} ({\bf Y}_m) \right \} \\ &= \lim_{n,m \to \infty} \left \{ \frac{\frac{\sigma_X^2}{n}}{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} + \frac{\frac{\sigma_Y^2}{m}}{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} \right \} \\ &= 1 \end{align*}

$\endgroup$
2
  • $\begingroup$ Thanks for the answer. I edited my post accordingly. Do you have any ideas on the follow-up questions? $\endgroup$
    – dereklck
    Commented Sep 21, 2021 at 3:01
  • $\begingroup$ @dereklck I edited my answer, I hope this helps $\endgroup$
    – periwinkle
    Commented Sep 21, 2021 at 8:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.