0
$\begingroup$

With the data as below:

  • Categorical predictor: "Group" with 2 levels: Group 1 and Group 2
  • Categorical predictor: "Treatment" with 3 levels: A, B, C
  • Categorical binary response: "Result" 0/1

with this structure:

> str(x)
tibble [345 × 3] (S3: tbl_df/tbl/data.frame)
$ Treatment: Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3 1 2 3 1 ...
$ Result   : num [1:345] 0 0 0 1 1 0 0 0 0 0 ...
$ Group    : Factor w/ 2 levels "Group1","Group2": 1 1 1 1 1 1 2 2 2 2 ...

exemplary parts (it's quite long, I attach the data at the end of the post):

> head(x)
# A tibble: 6 × 3
  Treatment Result Group 
  <fct>      <dbl> <fct> 
1 A              0 Group1
2 B              0 Group1
3 C              0 Group1
4 A              1 Group1
5 B              1 Group1
6 C              0 Group1

I want to test contrasts like this but through nesting models, like the ANOVA does:

Treatment A: Group 1 - Group 2
Treatment B: Group 1 - Group 2
Treatment C: Group 1 - Group 2

The model is the logistic regression model:

>   m <- glm(Result ~ Group * Treatment, family = binomial(), data=x)
>   summary(m)

Call:
glm(formula = Result ~ Group * Treatment, family = binomial(), 
    data = x)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.315  -1.148   1.046   1.091   1.328  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)              0.2076     0.2640   0.786   0.4316  
GroupGroup2             -0.3130     0.3743  -0.836   0.4030  
TreatmentB              -0.5559     0.3752  -1.482   0.1384  
TreatmentC              -0.2766     0.3725  -0.743   0.4577  
GroupGroup2:TreatmentB   0.9798     0.5321   1.841   0.0656 .
GroupGroup2:TreatmentC   0.7004     0.5302   1.321   0.1865  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 478.04  on 344  degrees of freedom
Residual deviance: 472.84  on 339  degrees of freedom
AIC: 484.84

Number of Fisher Scoring iterations: 4
  • Intercept: log odds of the response when all categorical predictors are = reference values, here Group 1, Treatment A
  • GroupGroup2: the difference between Group 2 and Group 1 at Treatment A
  • TreatmentB: the difference from Treatment B and Treatment A at Group 1
  • TreatmentC : C - A at Group 1
  • GroupGroup2:TreatmentB - interaction for G2 at B vs. G1 at A
  • GroupGroup2:TreatmentC - interaction for G2 at C vs. G1 at A

Let's check the contrast for Group 1 vs Group 2 for treatment B

The cell mean for B: Group 1 can be obtained by adding these coefficients: Intercept + TreatmentB, so it's c(1, 0, 1, 0, 0, 0)

The cell mean for B: Group 2 can be obtained by adding these coefficients: Intercept + GroupGroup2 + TreatmentB + GroupGroup2:TreatmentB so it's c(1, 1, 1, 0, 1, 0)

I get the answer:

>   glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
    pvalue 
0.07791664 
attr(,"method")
[1] "Wald"

Which can be validated through the emmeans package (doing Wald's comparisons). I removed unnecessary other comparisons

>   pairs(emmeans(m, specs=~Group*Treatment), adjust="none")
 contrast            estimate    SE  df z.ratio p.value
 .....
 Group1 B - Group2 C  -0.6668 0.378 Inf  -1.763  0.0779
 .....

Results are given on the log odds ratio (not the response) scale. 

OK! Now let's replicate it via nested models, if possible. The model matrix is as below (the first few lines):

> model.matrix(g)
    (Intercept) GroupGroup2 TreatmentB TreatmentC GroupGroup2:TreatmentB GroupGroup2:TreatmentC
1             1           0          0          0                      0                      0
2             1           0          1          0                      0                      0
3             1           0          0          1                      0                      0

I can use these terms in the regression model:

>   m1 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC + 
+               GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+             family = binomial())
>   summary(m1)  

Call:
glm(formula = Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC + 
    GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, family = binomial(), 
    data = data.frame(cbind(Result = x$Result, model.matrix(g))))

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.315  -1.148   1.046   1.091   1.328  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)              0.2076     0.2640   0.786   0.4316  
GroupGroup2             -0.3130     0.3743  -0.836   0.4030  
TreatmentB              -0.5559     0.3752  -1.482   0.1384  
TreatmentC              -0.2766     0.3725  -0.743   0.4577  
GroupGroup2.TreatmentB   0.9798     0.5321   1.841   0.0656 .
GroupGroup2.TreatmentC   0.7004     0.5302   1.321   0.1865  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 478.04  on 344  degrees of freedom
Residual deviance: 472.84  on 339  degrees of freedom
AIC: 484.84

Number of Fisher Scoring iterations: 4

I tried naively:

> m_B_group1 <- glm(Result ~ 1 + TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+                   family = binomial())
> 
> m_B_group2 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB +  
+                       GroupGroup2.TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+                   family = binomial())
> 
> anova(m_B_group1, m_B_group2, test = "LRT")
Analysis of Deviance Table

Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1       343     477.83                     
2       341     474.66  2   3.1701   0.2049
> lmtest::waldtest(m_B_group1, m_B_group2)
Wald test

Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
  Res.Df Df      F Pr(>F)
1    343                 
2    341  2 1.5634 0.2109

But the results are far from the previous ones:

>   glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
    pvalue 
0.07791664 
attr(,"method")
[1] "Wald"

How can I improve it and replicate the contrasts through nested models? I was told a few months ago that this is almost always possible, but there must be some trick.


The data:

x <- structure(list(Treatment = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L), levels = c("A", "B", "C"), class = "factor"), Result = c(0, 
0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 
1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 
1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 
0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 
0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 
1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 
1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 0, 1), Group = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), levels = c("Group1", "Group2"), class = "factor")), row.names = c(NA, 
-345L), class = c("tbl_df", "tbl", "data.frame"))
$\endgroup$

0