Confidence interval for linear combination of regression coefficients that are calculated from different variables

Question

I would like to estimate a confidence interval for a linear combination of regression coefficients that come from several linear regression models that are calculated from different but correlated variables. For instance, suppose $y_1$ and $x_1$ represent cloud cover and temperature anomalies at one level of the atmosphere as measured by a satellite, and $y_2$ and $x_2$ represent cloud cover and temperature anomalies in a lower level level of the atmosphere. I expect $x_1$ to be correlated with $x_2$. I also expect $y_1$ to be negatively correlated with $y_2$ because the satellite only views the highest clouds. I use OLS regression to determine the relationship

$y_i = \beta_i x_i + \epsilon$ for $i \in {1,2}$

The standard error for the regression slopes is $\sigma_1$ and $\sigma_2$. Now I want to calculate a linear combination of the regression slopes: $C=a_1 \beta_1 +a_2 \beta_2$ where $a_1$ and $a_2$ are real-valued constants. I think that the standard error for $C$ can be calculated from the relationship

$\sigma_C^2 = a_1^2 \sigma_1^2 + a_2^2 \sigma_2^2 + 2a_1 a_2 \sigma_{1,2}$

where $\sigma_{1,2}$ represents the covariance between $\beta_1$ and $\beta_2$. My main question is how does one estimate $\sigma_{1,2}$ given that $\beta_1$ and $\beta_2$ are calculated from different variables?

Answer: you can't estimate $\sigma_{1,2}$ with the information given. It doesn't even exist unless your measurements are all paired with one another: that is, each time you have $(x_1,y_1)$ you also have $(x_2,y_2).$ Now if that's the case, there's an interesting question here if you would change your inquiry to exploring how to estimate $a_1\beta_1+a_2\beta_2$ rather than trying to combine two separate estimates. — whuber, Commented Nov 10, 2020 at 20:00
Thanks for your reply. The measurements are indeed all paired with one another. Every time I have a measurement of $(x_1,y_1)$ I also have a measurement of $(x_2,y_2)$. In terms of changing the inquiry to estimate $a_1 \beta_1 + a_2 \beta_2$ differently, it would be ideal to combine the two separate estimates of $\beta_1$ and $\beta_2$ because decomposing the linear combination this way would help to understand the physical system. But if doing the analysis that way makes it impossible to quantify uncertainty, then I need to figure out another way, of course. Many thanks. — user302356, Commented Nov 10, 2020 at 20:29

whuber · Accepted Answer · 2020-11-12 19:05:32Z

You may be able to avoid the problem of estimating $\sigma_{12}$ altogether.

Just to be clear, here is my understanding of your setup. You have a sample of quadruples $(X_1,X_2,Y_1,Y_2).$ Your model of it is that

$$\begin{aligned} Y_1 &= \beta_1 X_1 + \varepsilon_1 \\ Y_2 &= \beta_2 X_2 + \varepsilon_2 \end{aligned}$$

with $(\varepsilon_1,\varepsilon_2)$ independent of $(X_1,X_2).$ The covariance matrix of the errors exists, but is unknown, and can be written in terms of three parameters as

$$\operatorname{Cov}\pmatrix{\varepsilon_1\\\varepsilon_2} = \pmatrix{\sigma_1^2&\sigma_{12}\\\sigma_{12}&\sigma_2^2}.$$

You have stipulated numbers $a_i$ (they are not estimated from the data) and you wish to estimate

$$\gamma = a_1\beta_1 + a_2\beta_2.$$

One direct way to estimate this quantity is to note that your model implies the relation

$$a_1Y_1 + a_2Y_2 = a_1\beta_1 X_1 + a_2\beta_2 X_2 + (a_1\varepsilon_1 + a_2\varepsilon_2).$$

Under the foregoing assumptions this is a standard regression model for data $(X_1,X_2, a_1Y_1+a_2Y_2)$ which you might fit with, say, ordinary least squares (or any other regression technique appropriate for the assumed distribution of $a_1\varepsilon_1 + a_2\varepsilon_2)$). That procedure yields estimates $\widehat{a_i\beta_i}$ of the two coefficients, whose sum estimates $\gamma,$ along with an estimated covariance matrix $\widehat\Sigma$ for those estimates, from which you may extract the standard error as

$$\operatorname{se}(\widehat{\gamma}) = \pmatrix{1&1}\widehat\Sigma\pmatrix{1\\1}.$$

It sounds like you might be supposing the $\varepsilon_i$ are negatively correlated: that is, $\sigma_{12}\lt 0.$ If so, and if $a_1a_2 \gt 0,$ the error variance in this model is

$$\operatorname{Var}(a_1\varepsilon_1 + a_2\varepsilon_2) = a_1\sigma_1^2 + a_2\sigma_2^2 + a_1a_2\sigma_{12} \lt a_1\sigma_1^2 + a_2\sigma_2^2,$$

which is a nice thing to have: it means that this linear combination of the $Y_i$ tends to cancel out the errors, giving better estimates of $\gamma$ than if you separately regressed the $Y_i$ against the $X_i.$

Stack Exchange Network

Confidence interval for linear combination of regression coefficients that are calculated from different variables

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regression-coefficients
error-propagation
or ask your own question.

Hot Network Questions

Confidence interval for linear combination of regression coefficients that are calculated from different variables

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regression-coefficientserror-propagation or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression-coefficients
error-propagation
or ask your own question.