6
$\begingroup$

$$ r = \frac{{\rm Cov}(X,Y)}{ \sigma_{X} \sigma_{Y}} $$ I do not understand this equation at all. Where does it come from?

From my personal understanding ${\rm Cov}(X,Y)$ comes from that fact that $X$ and $Y$ are dependent random variables, that is, $E[XY]$ is not the same as $E[X]E[Y]$. Is this analogous to saying that $P(A \cap B) = P(A)P(B|A)$ if $A$ and $B$ are not independent? I'm just confused as to why we want the ratio of $E[XY]-E[X]E[Y]$ over the product of the standard deviations for $X$ and $Y$.

$\endgroup$
1
  • $\begingroup$ You may read here that this formula reduces to the formula of the cosine similarity, and r is the cosine for centered data. $\endgroup$
    – ttnphns
    Commented Oct 19, 2013 at 7:25

1 Answer 1

6
$\begingroup$

One nice thing you get from dividing by the product of standard deviations is that it guarantees that the correlation coefficient will be between -1 and +1.

If you want to determine if $X$ has a stronger linear relationship with $Y$ or with $Z$ comparing $cov(X,Y)$ with $cov(X,Z)$ directly is not informative, since the scale of each of the covariances depends on the variance of $Y$ an $Z$, which could be very different.

Dividing by $\sigma_X \sigma_Y$ normalizes the covariance, so you can compare $cor(X,Y)$ with $cor(X,Z)$ in meaningful way.

$\endgroup$
2
  • $\begingroup$ Good point, but how do we know that the Cov(X,Y) is less than or equal to the product of the standard deviations of X and Y? $\endgroup$
    – Person
    Commented Oct 18, 2013 at 20:38
  • 2
    $\begingroup$ I've never actually proven myself, but some Googling brought up this page: www2.math.umd.edu/~ddarmon/teaching/stat400/… $\endgroup$
    – Max S.
    Commented Oct 18, 2013 at 20:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.