Statistics: Regression hypothesis test of slope, given sample correlation

Question

Consider two data series, $X = (x_1, x_2, ..., x_n)$ and $Y = (y_1, y_2, ..., y_n)$, both with mean zero. We use linear regression (ordinary least squares) to regress $Y$ against $X$ (without fitting any intercept), as in $Y = aX + ε$ where $ε$ denotes a series of error terms. Suppose $\rho_{XY}$ is the sample correlation.

Then given that $\rho_{XY} = 0.01$ Is the resulting value of $a$ statistically significantly different from $0$ at the 95% level if:

i) $n = 10^2$

ii) $n = 10^3$

iii) $n = 10^4$

I was asked this question in an interview. I am not sure if there is enough information to calculate the significance from just the value of the sample correlation? Any help/ answer / suggestion will be much appreciated.

John · Accepted Answer · 2016-09-26 12:32:46Z

3

There's enough information.

The correlation coefficient would be equal to a standardized regression coefficient. If the standardized regression coefficient is significant then the regression coefficient is significant. And finally, the only thing that is necessary to determine significance of a found correlation is the N. Therefore, enough information was provided.

answered Sep 26, 2016 at 12:32

John

23.2k9 gold badges54 silver badges87 bronze badges

$\begingroup$ Can you please provide more detail. If the correlation coefficient is equal to the standardised regression coefficient, how does the sample size play a role here? because $n$ does not appear in the equation? Can you perhaps give the explicit formula? Thanks $\endgroup$
– vishmay
Commented Sep 26, 2016 at 12:36
$\begingroup$ Look up a significance test for a correlation coefficient. N plays a role there. (and it most certainly plays one in a test of a regression coefficient or a standardized regression coefficient). $\endgroup$
– John
Commented Sep 26, 2016 at 12:38
$\begingroup$ It appears I have to use t-distribution.. But my guess is this should be much simpler than t-distribution. Since I was not given any tables. $\endgroup$
– vishmay
Commented Sep 26, 2016 at 12:42
$\begingroup$ @FernandoAlonso for large $n$ you can use the normal and they were assuming you had memorised the critical value of 1.96. $\endgroup$
– mdewey
Commented Sep 26, 2016 at 12:47
$\begingroup$ @mdewey Thanks. I see now. I was unable to find the correct statistic to use involving the sample correlation and the sample size. Can you please point me to the right resource or give the formula? or How to derive it. Thanks a lot $\endgroup$
– vishmay
Commented Sep 26, 2016 at 12:49

| Show 4 more comments

Glen_b · Accepted Answer · 2016-09-27 04:32:51Z

If I was asked in an interview (i.e. verbally rather than on paper), where I'd think the focus would be on demonstrating on-hand understanding of facts that give a quick approximate answer, I'd respond as follows:

Since when the population correlation is 0, the sample correlation has an asymptotic standard error of $1/\sqrt{n}$ (and should be asymptotically normal), a correlation of $0.01$ would correspond roughly to an approximate $Z$ value of about $0.1$ at $n=100$, $\sqrt{1/10}$ at $n=1000$ and $1$ at $n=10000$ respectively. Distinction between this asymptotic $Z$ and regression's $t$-value nor the accuracy of the asymptotic approximation at say $n=100$ and other such issues won't make enough difference to matter here.

If the correlation were twice as large it would be significant at $n=10000$ and if it was a bit over six times as large it would be significant at $n=1000$; it would need to be about 20 times as large (i.e. about 0.2) to be significant at $n=100$.

Additional accuracy in that calculation is unimportant and we don't need an approximation that works when the correlation isn't zero; we only need the information for the sampling distribution at $\rho=0$ and really only the asymptotic $1/\sqrt{n}$ fact is needed.

If I was solving it with pen and paper and had a few minutes to try to pull the details up out of what's left of my memory (or to try to derive them), I'd consider discussing the relationship of the correlation to the t-test in regression - but it would have no impact on the conclusions.

I'd also point out that they mean "at the 5% level" not "at the 95% level" (politely, of course).

If asked to demonstrate the $1/\sqrt{n}$ fact, it's pretty straightforward -- $Var(XY)$ for independent zero-mean RVs isn't too hard to derive - which is the main thing (things may be a bit easier if you assume the variances are both 1 and since we're passing to a correlation the scale won't matter).

You can argue the asymptotic distribution of the correlation coefficient by using Slutsky's theorem to focus on the numerator, which is an average and then argue from CLT.

Basic facts like the asymptotic standard error of a sample correlation when the population correlation is zero (which is what s often used to judge an autocorrelation or partial autocorrelation, for example) are just the kind of thing I'd hope an aspiring statistician would have in their head. It's interesting how often you can tell what will be significant and what won't with a few simple facts.

Thanks a lot for your answer. In fact this was given as an interview pen - paper test, asking me to do a detailed derivation and working of the results. Can you please post a more detailed derivation of the test statistic to be used? I would be much obliged. — vishmay, Commented Sep 27, 2016 at 11:28

Stack Exchange Network

Statistics: Regression hypothesis test of slope, given sample correlation

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
regression
hypothesis-testing
self-study
correlation
regression-coefficients
or ask your own question.

Linked

Hot Network Questions

Statistics: Regression hypothesis test of slope, given sample correlation

2 Answers 2

Not the answer you're looking for? Browse other questions tagged regressionhypothesis-testingself-studycorrelationregression-coefficients or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
hypothesis-testing
self-study
correlation
regression-coefficients
or ask your own question.