0
$\begingroup$

In Aldrich (2005), and specifically in sections 10 and 11, the author describes the sufficient statistic for the parameter $\beta$ in the simple regression of random $Y$ on fixed $X$, with a bivariate normal population with known variance $\sigma^2$, when $X$ is normal with known variance $\alpha$ and $b$ is normal with variance $\sigma^2/A$, where $A$ is an ancillary statistic computed from the sample, equal to the sum of squared deviations in $X$. The joint statistic $(b, A)$ is sufficient for $\beta$, and he shows one would lose information by estimating $b$ "by ignoring the value of $A$ and using the value of $\alpha$ and the sample size $N$."

Although I understand regression and correlation are not the same thing, I naturally wonder about the implications of Aldrich (2005) for defining the most informative estimators of the bivariate normal and its correlation, when $X$ is fixed with known population variance. To wit, the likelihood function for the bivariate normal depends on only two functions of the sample, the sample means and the sample covariance matrix (see here); the latter obviously includes the sample variance of $X$. Likewise, the $MLE$ of the bivariate standard normal is the Pearson correlation (see here), which is obviously a function of the sample standard deviation of $X$.

My questions are: 1) Does Aldrich (2005) imply one would obtain a better estimate of the bivariate normal by using the sample covariance matrix while ignoring the known, true variance of $X$? 2) Does it imply the Pearson correlation is a better estimate of population correlation $\rho$, when computed from the sample variance of $X$ while ignoring the known population variance for $X$? In short, is it useless to know an actual parameter value in these cases?

$\endgroup$
3
  • 1
    $\begingroup$ If $X$ is fixed and $Y$ sampled for those $X$, then I would have thought the "population" and "sample" variance would be the same. If instead $(X,Y)$ are sampled, then you want to base your analysis on the $X$ actually seen rather than what they might otherwise have been. $\endgroup$
    – Henry
    Commented Jul 6 at 12:09
  • $\begingroup$ @Henry: $X$ is not called fixed in regression because $X$ is literally unsampled. Both $X$ and $Y$ are sampled, and both vary from one sample to another. One just "holds fixed" $X$, statistically, to account for ancillary information. I guess another way of phrasing my question would be: do the $MLE$s for the $BN$ and correlation also benefit from ancillary information, and if so, would that benefit be lost by using the parameter's value? $\endgroup$
    – virtuolie
    Commented Jul 7 at 0:46
  • $\begingroup$ So you are saying that the $(X,Y)$ are sampled. I would still suggest you want to base your analysis on the $X$ actually seen rather than what they might otherwise have been; you do not have $Y$ observations for those unseen $X$. $\endgroup$
    – Henry
    Commented Jul 7 at 2:06

0