With the data as below:
- Categorical predictor: "Group" with 2 levels: Group 1 and Group 2
- Categorical predictor: "Treatment" with 3 levels: A, B, C
- Categorical binary response: "Result" 0/1
with this structure:
> str(x)
tibble [345 × 3] (S3: tbl_df/tbl/data.frame)
$ Treatment: Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3 1 2 3 1 ...
$ Result : num [1:345] 0 0 0 1 1 0 0 0 0 0 ...
$ Group : Factor w/ 2 levels "Group1","Group2": 1 1 1 1 1 1 2 2 2 2 ...
exemplary parts (it's quite long, I attach the data at the end of the post):
> head(x)
# A tibble: 6 × 3
Treatment Result Group
<fct> <dbl> <fct>
1 A 0 Group1
2 B 0 Group1
3 C 0 Group1
4 A 1 Group1
5 B 1 Group1
6 C 0 Group1
I want to test contrasts like this but through nesting models, like the ANOVA does:
Treatment A: Group 1 - Group 2
Treatment B: Group 1 - Group 2
Treatment C: Group 1 - Group 2
The model is the logistic regression model:
> m <- glm(Result ~ Group * Treatment, family = binomial(), data=x)
> summary(m)
Call:
glm(formula = Result ~ Group * Treatment, family = binomial(),
data = x)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.315 -1.148 1.046 1.091 1.328
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2076 0.2640 0.786 0.4316
GroupGroup2 -0.3130 0.3743 -0.836 0.4030
TreatmentB -0.5559 0.3752 -1.482 0.1384
TreatmentC -0.2766 0.3725 -0.743 0.4577
GroupGroup2:TreatmentB 0.9798 0.5321 1.841 0.0656 .
GroupGroup2:TreatmentC 0.7004 0.5302 1.321 0.1865
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 478.04 on 344 degrees of freedom
Residual deviance: 472.84 on 339 degrees of freedom
AIC: 484.84
Number of Fisher Scoring iterations: 4
- Intercept: log odds of the response when all categorical predictors are = reference values, here Group 1, Treatment A
- GroupGroup2: the difference between Group 2 and Group 1 at Treatment A
- TreatmentB: the difference from Treatment B and Treatment A at Group 1
- TreatmentC : C - A at Group 1
- GroupGroup2:TreatmentB - interaction for G2 at B vs. G1 at A
- GroupGroup2:TreatmentC - interaction for G2 at C vs. G1 at A
Let's check the contrast for Group 1 vs Group 2 for treatment B
The cell mean for B: Group 1
can be obtained by adding these coefficients:
Intercept + TreatmentB
, so it's c(1, 0, 1, 0, 0, 0)
The cell mean for B: Group 2
can be obtained by adding these coefficients:
Intercept + GroupGroup2 + TreatmentB + GroupGroup2:TreatmentB
so it's c(1, 1, 1, 0, 1, 0)
I get the answer:
> glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
pvalue
0.07791664
attr(,"method")
[1] "Wald"
Which can be validated through the emmeans package (doing Wald's comparisons). I removed unnecessary other comparisons
> pairs(emmeans(m, specs=~Group*Treatment), adjust="none")
contrast estimate SE df z.ratio p.value
.....
Group1 B - Group2 C -0.6668 0.378 Inf -1.763 0.0779
.....
Results are given on the log odds ratio (not the response) scale.
OK! Now let's replicate it via nested models, if possible. The model matrix is as below (the first few lines):
> model.matrix(g)
(Intercept) GroupGroup2 TreatmentB TreatmentC GroupGroup2:TreatmentB GroupGroup2:TreatmentC
1 1 0 0 0 0 0
2 1 0 1 0 0 0
3 1 0 0 1 0 0
I can use these terms in the regression model:
> m1 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC +
+ GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+ family = binomial())
> summary(m1)
Call:
glm(formula = Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC +
GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, family = binomial(),
data = data.frame(cbind(Result = x$Result, model.matrix(g))))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.315 -1.148 1.046 1.091 1.328
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2076 0.2640 0.786 0.4316
GroupGroup2 -0.3130 0.3743 -0.836 0.4030
TreatmentB -0.5559 0.3752 -1.482 0.1384
TreatmentC -0.2766 0.3725 -0.743 0.4577
GroupGroup2.TreatmentB 0.9798 0.5321 1.841 0.0656 .
GroupGroup2.TreatmentC 0.7004 0.5302 1.321 0.1865
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 478.04 on 344 degrees of freedom
Residual deviance: 472.84 on 339 degrees of freedom
AIC: 484.84
Number of Fisher Scoring iterations: 4
I tried naively:
> m_B_group1 <- glm(Result ~ 1 + TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+ family = binomial())
>
> m_B_group2 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB +
+ GroupGroup2.TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+ family = binomial())
>
> anova(m_B_group1, m_B_group2, test = "LRT")
Analysis of Deviance Table
Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 343 477.83
2 341 474.66 2 3.1701 0.2049
> lmtest::waldtest(m_B_group1, m_B_group2)
Wald test
Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
Res.Df Df F Pr(>F)
1 343
2 341 2 1.5634 0.2109
But the results are far from the previous ones:
> glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
pvalue
0.07791664
attr(,"method")
[1] "Wald"
How can I improve it and replicate the contrasts through nested models? I was told a few months ago that this is almost always possible, but there must be some trick.
The data:
x <- structure(list(Treatment = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), levels = c("A", "B", "C"), class = "factor"), Result = c(0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1,
1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1,
0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0,
0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1,
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1,
1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1,
1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1,
1, 0, 1, 0, 0, 1, 0, 1), Group = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), levels = c("Group1", "Group2"), class = "factor")), row.names = c(NA,
-345L), class = c("tbl_df", "tbl", "data.frame"))