0
$\begingroup$

Let $X_1,\dots,X_n$ be real random variables such that $\alpha_1X_1+\dots+\alpha_nX_n=0$ for some unknown $\alpha_1,\dots,\alpha_n$. If $n=2$, one can study the strength of linear relationship by looking at the correlation $\rho_{X_1,X_2}=\frac{\mathbb{E}(X_1-\mu_1)(X_2-\mu_2)}{\sigma_1\sigma_2}$, where $\mu_i$ and $\sigma_i$ are the expected value and standard deviation of $X_i$, respectively. As expected, when I work with (a nice) data sample, I get $\rho_{X_1,X_2}\simeq 1$.

I wanted to generalise this idea to arbitrary $n$ by looking at the moment $\rho_{X_1,\dots,X_n}:=\frac{\mathbb{E}\left[\prod_{i=1}^n (X_i-\mu_i)\right]}{\prod_{i=1}^n\sigma_i}$ but when I work with data samples, I get $\rho_{X_1,\dots,X_n}\simeq 0$. What am I misunderstanding here?

This looks very similar to model selection and I was considering using the coefficient of multiple correlation $R^2$, but I don't have a distinguished dependent variable here and, in my experiments, the values of sample $R^2$ really depend on which $X_i$ is chosen to be the "dependent" variable.

$\endgroup$
5
  • 1
    $\begingroup$ Please explain the form of generalization you are thinking of. If you are trying to assess whether the $n$ variables are confined close to some $n-1$ (or smaller) dimensional subspace, then the natural (and extremely effective) approach is to examine their smallest singular value. In the $n=2$ case, when you standardize the variables, the smallest singular value is $1-|\rho|$ and $1-|\rho| \approx 0$ corresponds to $\rho \approx \pm1.$ Or perhaps you are re-asking this question? $\endgroup$
    – whuber
    Commented Jun 12 at 19:08
  • 1
    $\begingroup$ I'm also not quite sure how you plan to bring this to the data. What equation do you use, just replacing the expected values with sample means? $\endgroup$
    – Dave
    Commented Jun 12 at 19:34
  • $\begingroup$ @whuber thank you, that question looks very relevant indeed. Could you elaborate on your comment a bit? What do you mean by the smallest singular value in this case? The smallest singular value of the correlation matrix? Is there a good reference I could read? $\endgroup$
    – 12345
    Commented Jun 13 at 2:11
  • $\begingroup$ @Dave, yes, the intention was to replace expected values with sample means (possibly after some massaging, such as dividing by n vs n-1 in sample covariance). I am open to other approaches, I just wasn't expecting to see a number close to 0 $\endgroup$
    – 12345
    Commented Jun 13 at 2:14
  • $\begingroup$ As explained in the referenced post, you can use the smallest singular value of the SVD of the standardized variables if you like. Its square is the smallest singular value of the correlation matrix. $\endgroup$
    – whuber
    Commented Jun 13 at 13:23

0