1
$\begingroup$

This is not about obtaining any dataset. I HAVE the dataset. This is not about debugging code, this is about EXPLAINING the way to obtain statistical relationship between the nested models (LRT ANOVA) and testing simple effects. R is used only for illustration.

Let's say I have data like this:

  • Categorical predictor: "Group" with 2 levels: Group 1 and Group 2
  • Categorical predictor: "Treatment" with 3 levels: A, B, C
  • Categorical binary response: "Result" 0/1

with this structure:

> str(x)
tibble [345 × 3] (S3: tbl_df/tbl/data.frame)
$ Treatment: Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3 1 2 3 1 ...
$ Result   : num [1:345] 0 0 0 1 1 0 0 0 0 0 ...
$ Group    : Factor w/ 2 levels "Group1","Group2": 1 1 1 1 1 1 2 2 2 2 ...

exemplary parts (it's quite long, I attach the data at the end of the post):

> head(x)
# A tibble: 6 × 3
  Treatment Result Group 
  <fct>      <dbl> <fct> 
1 A              0 Group1
2 B              0 Group1
3 C              0 Group1
4 A              1 Group1
5 B              1 Group1
6 C              0 Group1

I want to test some planned contrasts about the fraction of results using the logistic regression (the glm(family = binomial(link="logit"))). Namely, I want to compare Group 1 vs Group 2 at each level of Treatment.

I would like to test it through the Likelihood Ratio or Wald's but through model comparisons. To do this, I need two models - one with "term" of interest and without it.

But, unlike in the ANOVA procedure, I'm not interested in the impact of the whole variable, but rather concrete levels. And I have no idea how to specify a model which specifies concrete levels, not entire variables!

In other words, I DO NOT want something like this: anova(glm(Response ~ Group + Treatment,...), glm(Response ~ Group, ...)

Instead, I want this:

Treatment A: Group 1 - Group 2
Treatment B: Group 1 - Group 2
Treatment C: Group 1 - Group 2

First, let's fit the model

>   m <- glm(Result ~ Group * Treatment, family = binomial(), data=x)
>   summary(m)

Call:
glm(formula = Result ~ Group * Treatment, family = binomial(), 
    data = x)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.315  -1.148   1.046   1.091   1.328  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)              0.2076     0.2640   0.786   0.4316  
GroupGroup2             -0.3130     0.3743  -0.836   0.4030  
TreatmentB              -0.5559     0.3752  -1.482   0.1384  
TreatmentC              -0.2766     0.3725  -0.743   0.4577  
GroupGroup2:TreatmentB   0.9798     0.5321   1.841   0.0656 .
GroupGroup2:TreatmentC   0.7004     0.5302   1.321   0.1865  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 478.04  on 344  degrees of freedom
Residual deviance: 472.84  on 339  degrees of freedom
AIC: 484.84

Number of Fisher Scoring iterations: 4

I know how to do this for Wald approach using the glmlrt R package and the model coefficients. Let me check the contrast for Group 1 vs Group 2 for treatment B

  • Intercept: log odds of the response when all categorical predictors are = reference values, here Group 1, Treatment A
  • GroupGroup2: the difference between Group 2 and Group 1 at Treatment A
  • TreatmentB: the difference from Treatment B and Treatment A at Group 1
  • TreatmentC : C - A at Group 1
  • GroupGroup2:TreatmentB - interaction for G2 at B vs. G1 at A
  • GroupGroup2:TreatmentC - interaction for G2 at C vs. G1 at A

The cell mean for B: Group 1 can be obtained by adding these coefficients: Intercept + TreatmentB, so it's c(1, 0, 1, 0, 0, 0)

The cell mean for B: Group 2 can be obtained by adding these coefficients: Intercept + GroupGroup2 + TreatmentB + GroupGroup2:TreatmentB so it's c(1, 1, 1, 0, 1, 0)

I get the answer:

>   glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
    pvalue 
0.07791664 
attr(,"method")
[1] "Wald"

Let me validate it through the emmeans package (doing Wald's comparisons). I removed unnecessary other comparisons

>   pairs(emmeans(m, specs=~Group*Treatment), adjust="none")
 contrast            estimate    SE  df z.ratio p.value
 .....
 Group1 B - Group2 C  -0.6668 0.378 Inf  -1.763  0.0779
 .....

Results are given on the log odds ratio (not the response) scale. 

Unfortunately, I cannot do this via LRT but it fails...

>   glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="LRT")
pvalue 
    NA 
attr(,"method")
[1] "LRT"
Warning message:
In msg(debuglevel, level_warning, ...) :
  coefficient -GroupGroup2 + -GroupGroup2:TreatmentB : Hypothesis test failed because the H0 model diverged.

Anyway, I'm wondering how to do this without using external, third-party libraries. In other words, can this be done just by using anova(model1, model2) or in any other "native" way? I would ask even more broadly: is it possible to test such contrasts using nested models? I'm especially interested in the context of the Likelihood Ratio testing!

The glmlrt package somehow does it. I was told, that it is possible to construct nested model for any contrast. So how do we define the nesting here, if not using entire variables, but rather their levels? How to pass it to the anova(m1, m2) or something similar?

To moderators: I asked in the context of R, but I am also interested in a more general answer, is it even possible to defined nested models (regardless of language) that could "describe" such contrasts?. R is only an illustration of what I'm trying to ask, as it is the only language I know.

The data:

x <- structure(list(Treatment = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L), levels = c("A", "B", "C"), class = "factor"), Result = c(0, 
0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 
1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 
1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 
0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 
0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 
1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 
1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 0, 1), Group = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), levels = c("Group1", "Group2"), class = "factor")), row.names = c(NA, 
-345L), class = c("tbl_df", "tbl", "data.frame"))

EDITED OK, I think I found "some way", but getting different results and I'm not sure how to align them. Can you help me?

I checked the model matrix. Just a few lines:

> model.matrix(g)
    (Intercept) GroupGroup2 TreatmentB TreatmentC GroupGroup2:TreatmentB GroupGroup2:TreatmentC
1             1           0          0          0                      0                      0
2             1           0          1          0                      0                      0
3             1           0          0          1                      0                      0

I used these terms in the regression model, so I could easily manipulate the terms to compare:

>   m1 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC + 
+               GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+             family = binomial())
>   summary(m1)  

Call:
glm(formula = Result ~ 1 + GroupGroup2 + TreatmentB + TreatmentC + 
    GroupGroup2.TreatmentB + GroupGroup2.TreatmentC, family = binomial(), 
    data = data.frame(cbind(Result = x$Result, model.matrix(g))))

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.315  -1.148   1.046   1.091   1.328  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)              0.2076     0.2640   0.786   0.4316  
GroupGroup2             -0.3130     0.3743  -0.836   0.4030  
TreatmentB              -0.5559     0.3752  -1.482   0.1384  
TreatmentC              -0.2766     0.3725  -0.743   0.4577  
GroupGroup2.TreatmentB   0.9798     0.5321   1.841   0.0656 .
GroupGroup2.TreatmentC   0.7004     0.5302   1.321   0.1865  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 478.04  on 344  degrees of freedom
Residual deviance: 472.84  on 339  degrees of freedom
AIC: 484.84

Number of Fisher Scoring iterations: 4

And here it works for both LRT and Wald. But, unfortunately, the results, although close to each other, are now very different from the previous one:

> m_B_group1 <- glm(Result ~ 1 + TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+                   family = binomial())
> 
> m_B_group2 <- glm(Result ~ 1 + GroupGroup2 + TreatmentB +  
+                       GroupGroup2.TreatmentB, data =data.frame(cbind(Result = x$Result, model.matrix(g))),
+                   family = binomial())
> 
> anova(m_B_group1, m_B_group2, test = "LRT")
Analysis of Deviance Table

Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1       343     477.83                     
2       341     474.66  2   3.1701   0.2049
> lmtest::waldtest(m_B_group1, m_B_group2)
Wald test

Model 1: Result ~ 1 + TreatmentB
Model 2: Result ~ 1 + GroupGroup2 + TreatmentB + GroupGroup2.TreatmentB
  Res.Df Df      F Pr(>F)
1    343                 
2    341  2 1.5634 0.2109

while previously it was

>   glmglrt::p_value_contrast(m, contrast = c(1, 0, 1, 0, 0, 0) - c(1, 1, 1, 0, 1, 0), alternative="two.sided", method="Wald")
    pvalue 
0.07791664 
attr(,"method")
[1] "Wald"

which agreed with the LS-means calculations of appropriate contrast. Now its's 0.078 vs. 0.211

I mistaken somewhere but I guess it's something trivial I miss...

$\endgroup$
1
  • $\begingroup$ I can't add an answer while the question is closed, but if you modify the dataset such that Group only takes one value for treatment level B (effectively adding the constraint corresponding to your null hypothesis) you could compare the full models estimated on the new and the original dataset. $\endgroup$ Commented Apr 22 at 8:06

0