I am trying to understand Lord's paradox, where controlling for baseline status can affect inference. I tried to set up some data following the quotation in Wikipedia
“A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June are recorded.” In both September and June, the distribution of male weights is the same: the males' weights have the same mean and variance, and likewise for the distribution of female weights. Lord posits two statisticians who use different but respected statistical methods to reach opposite conclusions about the effects of the diet provided in the university dining halls on students' weights.
So I gave each student a base weight, influenced by gender but with a spread so the two groups overlapped and the overall distribution was unimodal. I then supposed that at the initial weighings they were at their base weights with some noise, and similarly at the final weighings, so that the change was just noise, and the noise had the same distribution for each individual, and the initial and final weights had the same distributions. An attempted attempt to analyse changes by group produced small non-significant results.
But a regression of final weights against group and initial weights did produce apparently significant and substantial results for the group indicator. This is what I believe is what happens in Lord's paradox, but I am not certain whether it is supposed to apply in this situation.
I can give a handwaving explanation of what happened: for each group, the second analysis in effect drew regression lines through the means of each group (which did not change between initial and final weighings), but because of the noise the correlations were not perfect and so the slopes of the lines were inevitably less steep than a diagonal line, meaning the two regression lines inevitably had different intercepts, a regression to the mean effect. Here is a chart illustrating the point:
But playing further, it seems it is the initial noise which causes this: a regression of final weights against group and base weights does not produce this effect, while a regression of base weights against group and initial weights does and with almost the same impact on the group coefficient. (In these last attempts, the base weights have lower variances than the initial and final weights, so the handwaving argument about needing perfect correlation for a diagonal line does not apply so directly.) So it seems to be the regression to the mean from the initial weights which is producing the apparent paradoxical effect.
Simulating the data (in kilograms with female in group 0 and male in group 1):
set.seed(2021)
N <- 2000
group <- rep(c(0, 1), times=N/2)
base <- rnorm(N, mean=70 + 20*group, sd=10)
initial <- base + rnorm(N, mean=0, sd=5)
final <- base + rnorm(N, mean=0, sd=5)
change <- final - initial
comparing change
with group
revealed nothing significant (as expected)
> summary(lm(change ~ group))
Call:
lm(formula = change ~ group)
Residuals:
Min 1Q Median 3Q Max
-32.092 -4.604 0.226 4.826 26.858
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.03355 0.22250 -0.151 0.880
group -0.04256 0.31466 -0.135 0.892
Residual standard error: 7.036 on 1998 degrees of freedom
Multiple R-squared: 9.157e-06, Adjusted R-squared: -0.0004913
F-statistic: 0.0183 on 1 and 1998 DF, p-value: 0.8924
while regressing final
against group
and initial
does produce something significant for the coefficient of group
> summary(lm(final ~ group + initial))
Call:
lm(formula = final ~ group + initial)
Residuals:
Min 1Q Median 3Q Max
-27.308 -4.237 0.023 4.471 23.666
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.46634 0.94881 14.193 <2e-16 ***
group 3.73929 0.39579 9.448 <2e-16 ***
initial 0.80855 0.01312 61.643 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.69 on 1997 degrees of freedom
Multiple R-squared: 0.803, Adjusted R-squared: 0.8028
F-statistic: 4070 on 2 and 1997 DF, p-value: < 2.2e-16
and playing with some other regressions suggests that it is the presence of initial
which is causing this
> summary(lm(final ~ group + base))
Call:
lm(formula = final ~ group + base)
Residuals:
Min 1Q Median 3Q Max
-19.6212 -3.2292 -0.1426 3.3610 16.8688
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.47777 0.78703 -0.607 0.544
group 0.06003 0.30853 0.195 0.846
base 1.00665 0.01094 92.021 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.979 on 1997 degrees of freedom
Multiple R-squared: 0.8909, Adjusted R-squared: 0.8908
F-statistic: 8152 on 2 and 1997 DF, p-value: < 2.2e-16
while
> summary(lm(base ~ group + initial))
Call:
lm(formula = base ~ group + initial)
Residuals:
Min 1Q Median 3Q Max
-15.3176 -2.8154 0.1182 3.1505 14.0104
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.250421 0.648053 21.99 <2e-16 ***
group 3.766569 0.270328 13.93 <2e-16 ***
initial 0.797561 0.008959 89.02 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.569 on 1997 degrees of freedom
Multiple R-squared: 0.8952, Adjusted R-squared: 0.8951
F-statistic: 8526 on 2 and 1997 DF, p-value: < 2.2e-16