2
$\begingroup$

Consider two data series, $X = (x_1, x_2, ..., x_n)$ and $Y = (y_1, y_2, ..., y_n)$, both with mean zero. We use linear regression (ordinary least squares) to regress $Y$ against $X$ (without fitting any intercept), as in $Y = aX + ε$ where $ε$ denotes a series of error terms. Suppose $\rho_{XY}$ is the sample correlation.

Then given that $\rho_{XY} = 0.01$ Is the resulting value of $a$ statistically significantly different from $0$ at the 95% level if:

i) $n = 10^2$

ii) $n = 10^3$

iii) $n = 10^4$

I was asked this question in an interview. I am not sure if there is enough information to calculate the significance from just the value of the sample correlation? Any help/ answer / suggestion will be much appreciated.

$\endgroup$

2 Answers 2

3
$\begingroup$

There's enough information.

The correlation coefficient would be equal to a standardized regression coefficient. If the standardized regression coefficient is significant then the regression coefficient is significant. And finally, the only thing that is necessary to determine significance of a found correlation is the N. Therefore, enough information was provided.

$\endgroup$
9
  • $\begingroup$ Can you please provide more detail. If the correlation coefficient is equal to the standardised regression coefficient, how does the sample size play a role here? because $n$ does not appear in the equation? Can you perhaps give the explicit formula? Thanks $\endgroup$
    – vishmay
    Commented Sep 26, 2016 at 12:36
  • $\begingroup$ Look up a significance test for a correlation coefficient. N plays a role there. (and it most certainly plays one in a test of a regression coefficient or a standardized regression coefficient). $\endgroup$
    – John
    Commented Sep 26, 2016 at 12:38
  • $\begingroup$ It appears I have to use t-distribution.. But my guess is this should be much simpler than t-distribution. Since I was not given any tables. $\endgroup$
    – vishmay
    Commented Sep 26, 2016 at 12:42
  • $\begingroup$ @FernandoAlonso for large $n$ you can use the normal and they were assuming you had memorised the critical value of 1.96. $\endgroup$
    – mdewey
    Commented Sep 26, 2016 at 12:47
  • $\begingroup$ @mdewey Thanks. I see now. I was unable to find the correct statistic to use involving the sample correlation and the sample size. Can you please point me to the right resource or give the formula? or How to derive it. Thanks a lot $\endgroup$
    – vishmay
    Commented Sep 26, 2016 at 12:49
1
$\begingroup$

If I was asked in an interview (i.e. verbally rather than on paper), where I'd think the focus would be on demonstrating on-hand understanding of facts that give a quick approximate answer, I'd respond as follows:

Since when the population correlation is 0, the sample correlation has an asymptotic standard error of $1/\sqrt{n}$ (and should be asymptotically normal), a correlation of $0.01$ would correspond roughly to an approximate $Z$ value of about $0.1$ at $n=100$, $\sqrt{1/10}$ at $n=1000$ and $1$ at $n=10000$ respectively. Distinction between this asymptotic $Z$ and regression's $t$-value nor the accuracy of the asymptotic approximation at say $n=100$ and other such issues won't make enough difference to matter here.

If the correlation were twice as large it would be significant at $n=10000$ and if it was a bit over six times as large it would be significant at $n=1000$; it would need to be about 20 times as large (i.e. about 0.2) to be significant at $n=100$.

Additional accuracy in that calculation is unimportant and we don't need an approximation that works when the correlation isn't zero; we only need the information for the sampling distribution at $\rho=0$ and really only the asymptotic $1/\sqrt{n}$ fact is needed.

If I was solving it with pen and paper and had a few minutes to try to pull the details up out of what's left of my memory (or to try to derive them), I'd consider discussing the relationship of the correlation to the t-test in regression - but it would have no impact on the conclusions.

I'd also point out that they mean "at the 5% level" not "at the 95% level" (politely, of course).

If asked to demonstrate the $1/\sqrt{n}$ fact, it's pretty straightforward -- $Var(XY)$ for independent zero-mean RVs isn't too hard to derive - which is the main thing (things may be a bit easier if you assume the variances are both 1 and since we're passing to a correlation the scale won't matter).

You can argue the asymptotic distribution of the correlation coefficient by using Slutsky's theorem to focus on the numerator, which is an average and then argue from CLT.

Basic facts like the asymptotic standard error of a sample correlation when the population correlation is zero (which is what s often used to judge an autocorrelation or partial autocorrelation, for example) are just the kind of thing I'd hope an aspiring statistician would have in their head. It's interesting how often you can tell what will be significant and what won't with a few simple facts.

$\endgroup$
1
  • $\begingroup$ Thanks a lot for your answer. In fact this was given as an interview pen - paper test, asking me to do a detailed derivation and working of the results. Can you please post a more detailed derivation of the test statistic to be used? I would be much obliged. $\endgroup$
    – vishmay
    Commented Sep 27, 2016 at 11:28

Not the answer you're looking for? Browse other questions tagged or ask your own question.