4
$\begingroup$

The likelihood-ratio and score test are typically used for simple scalar hypotheses such as $\beta_1 = 0$ or $\beta_1 = \beta_2 = 0$. How can we test a linear combination of coefficients using the likelihood-ratio and score tests, such as $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 0$ and $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 3$? And more generally, is it possible to test nonlinear combination of coefficients, such as $H_0: \beta_1 + 2 \exp(\beta_2) = 0$, using the likelihood-ratio and score tests? How can one implement them in R?

These hypothesis involving linear and nonlinear hypotheses are motivated by assessment of interaction terms, predicted values, and effect comparisons. They are typically tested using the Wald test, with nonlinear hypotheses using the delta method to calculate standard errors after nonlinear transformation. However, likelihood-ratio tests usually give more accurate p values and confidence intervals than Wald tests in generalized linear models.

For example, a binary logit regression is estimated with a mean structure $\text{logit}(p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3$. What are the feasible ways to test $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 0$, $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 3$ and $H_0: \beta_1 + 2 \exp(\beta_2) = 0$? Sample data and model:

data("mtcars")
View(mtcars)
summary(Model <- glm(
  am ~ mpg + disp + hp, 
  family = binomial(), data = mtcars))
"             Estimate Std. Error z value Pr(>|z|)  
(Intercept) -33.81283   24.17533  -1.399   0.1619  
mpg           1.28498    0.89895   1.429   0.1529  
disp         -0.06545    0.04305  -1.520   0.1284  
hp            0.14936    0.07871   1.898   0.0577 .
    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 10.148  on 28  degrees of freedom
AIC: 18.148"
$\endgroup$

1 Answer 1

5
$\begingroup$

To give p values and confidence intervals of linear combinations of coefficients using the likelihood-ratio and score tests, it appears necessary to rearrange the model specification so that some estimates represent the linear combinations directly. Each linear combination should correspond to one estimate in the rearranged model after manipulation of predictors.

To make $\beta_1 + 2 \beta_2 - 7 \beta_3$ appear as a single estimate based on the original model $\text{logit}(p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3$, rearrange the model specification into $\begin{align} \text{logit}(p) &= \beta_0 + (\beta_1 + 2 \beta_2 - 7 \beta_3) x_1 + (- 2 \beta_2 + 7 \beta_3) x_1 + \beta_2 x_2 + \beta_3 x_3 \\ &= \beta_0 + (\beta_1 + 2 \beta_2 - 7 \beta_3) x_1 + \beta_2(-2 x_1 + x_2)+ \beta_3(7 x_1 + x_3). \end{align}$

If one substitutes $z_1 = x_1$, $z_2 = -2 x_1 + x_2$, and $z_3 = 7 x_1 + x_3$ for predictors in a new model $\text{logit}(p) = \gamma_0 + \gamma_1 z_1 + \gamma_2 z_2 + \gamma_3 z_3$, the estimate of $\gamma_1$ directly tests $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 0$ because $\gamma_0 = \beta_0, \gamma_1 = \beta_1 + 2 \beta_2 - 7 \beta_3, \gamma_2 = \beta_2, \gamma_3 = \beta_3$ by design. This rearranged model specification is shown as Model4 below, for which both likelihood-ratio and score tests can be implemented to report p values and confidence intervals.

# Test beta1 + 2 beta2 - 7 beta3 = 0
library(marginaleffects)
hypotheses(Model, hypothesis = "b2 + 2 * b3 - 7 * b4 = 0") # Wald test
"                     Term Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
 b2 + 2 * b3 - 7 * b4 = 0    0.109       0.78 0.139    0.889 0.2 -1.42   1.64"
summary(Model2 <- glm(
  am ~ I(disp - 2 * mpg) + I(hp + 7 * mpg), 
  family = binomial(), data = mtcars))
"                   Estimate Std. Error z value Pr(>|z|)  
(Intercept)       -31.55712   16.78387  -1.880   0.0601 .
I(disp - 2 * mpg)  -0.06966    0.03241  -2.149   0.0316 *
I(hp + 7 * mpg)     0.15133    0.07785   1.944   0.0519 .
    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 10.167  on 29  degrees of freedom
AIC: 16.167"
anova(Model2, Model) # LRT, no CI
"Analysis of Deviance Table
Model 1: am ~ I(disp - 2 * mpg) + I(hp + 7 * mpg)
Model 2: am ~ mpg + disp + hp
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        29     10.167                     
2        28     10.148  1 0.019071   0.8902"
summary(Model3 <- glm(
  am ~ mpg + disp + hp + offset((
    -2 * Model$coefficients[3] + 7 * Model$coefficients[4]) * mpg), 
  family = binomial(), data = mtcars))
"             Estimate Std. Error z value Pr(>|z|)  
(Intercept) -33.81283   24.17533  -1.399   0.1619  
mpg           0.10854    0.89895   0.121   0.9039  
disp         -0.06545    0.04305  -1.520   0.1284  
hp            0.14936    0.07871   1.898   0.0577 .
    Null deviance: 50.633  on 31  degrees of freedom
Residual deviance: 10.148  on 28  degrees of freedom
AIC: 18.148
Identical logLik, intercept, disp, hp coef as Model
Identical point estimate of beta1 + 2 beta2 - 7 beta3 as Wald
But different SE, as offset() has no uncertainty considerations"
summary(Model4 <- glm(
  am ~ mpg + I(disp - 2 * mpg) + I(hp + 7 * mpg), 
  family = binomial(), data = mtcars))
"                   Estimate Std. Error z value Pr(>|z|)  
(Intercept)       -33.81283   24.17533  -1.399   0.1619  
mpg                 0.10854    0.78006   0.139   0.8893  
I(disp - 2 * mpg)  -0.06545    0.04305  -1.520   0.1284  
I(hp + 7 * mpg)     0.14936    0.07871   1.898   0.0577 .
    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 10.148  on 28  degrees of freedom
AIC: 18.148
Identical point estimate and SE as hypotheses() using Wald"
drop1(Model4, test = "LRT") # LRT
"                  Df Deviance    AIC     LRT  Pr(>Chi)    
<none>                 10.148 18.148                      
mpg                1   10.167 16.167  0.0191  0.890164    
I(disp - 2 * mpg)  1   19.233 25.233  9.0844  0.002578 ** 
I(hp + 7 * mpg)    1   28.606 34.606 18.4577 1.737e-05 ***
identical p value as anova(Model2, Model)"
confint(Model4, test = "LRT") # LRT CI of the linear combination
"                         2.5 %      97.5 %
(Intercept)       -110.6178796 -3.67330617
mpg                 -1.6495293  1.80305693
I(disp - 2 * mpg)   -0.1988085 -0.01323519
I(hp + 7 * mpg)      0.0483784  0.41193985"
drop1(Model4, test = "Rao") # score test
"                  Df Deviance    AIC Rao score Pr(>Chi)    
<none>                 10.148 18.148                       
mpg                1   10.167 16.167    0.0194  0.88920    
I(disp - 2 * mpg)  1   19.233 25.233    5.6283  0.01767 *  
I(hp + 7 * mpg)    1   28.606 34.606   15.7814 7.11e-05 ***"
confint(Model4, test = "Rao") # score test CI
"                         2.5 %       97.5 %
(Intercept)       -83.44852761  0.199569732
mpg                -1.33719984  1.371042258
I(disp - 2 * mpg)  -0.14645346 -0.004624354
I(hp + 7 * mpg)     0.02532632  0.300456929"

If a linear hypothesis includes a numeric constant on the right-hand side, such as $H_0: \beta_1 + 2 \beta_2 - 7 \beta_3 = 3$ or equivalently $\beta_1 + 2 \beta_2 - 7 \beta_3 - 3 = 0$, then it is necessary to use the offset term for the constant, which typically appear in count models as the exposure measure. The original model needs rearranging into $\begin{align} \text{logit}(p) &= \beta_0 + (\beta_1 + 2 \beta_2 - 7 \beta_3 - 3) x_1 + 3 x_1 + \beta_2(-2 x_1 + x_2) + \beta_3(7 x_1 + x_3). \end{align}$ Because the term $3 x_1$ does not need estimation and has no uncertainty, it actually has a coefficient exactly equal to one as in $1 \cdot 3 x_1$. Enforcing a coefficient of one is what an offset term does to its content numerically in a generalized linear model estimation. See that in Model5 below, none of the point estimates and standard errors change from those in Model4 except that the point estimate of $\lambda_1$ slides down by exactly three. So do its confidence intervals.

# Test beta1 + 2 beta2 - 7 beta3 = 3
summary(Model5 <- glm(
  am ~ mpg + offset(3 * mpg) + I(disp - 2 * mpg) + I(hp + 7 * mpg), 
  family = binomial(), data = mtcars))
"                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -33.81283   24.17533  -1.399  0.16192    
mpg                -2.89146    0.78006  -3.707  0.00021 ***
I(disp - 2 * mpg)  -0.06545    0.04305  -1.520  0.12842    
I(hp + 7 * mpg)     0.14936    0.07871   1.898  0.05775 .  
    Null deviance: 115.509  on 31  degrees of freedom
Residual deviance:  10.148  on 28  degrees of freedom
AIC: 18.148"
drop1(Model5, test = "LRT")
"                  Df Deviance    AIC    LRT  Pr(>Chi)    
<none>                  10.15  18.15                     
mpg                1   360.44 366.44 350.29 < 2.2e-16 ***
I(disp - 2 * mpg)  1    19.23  25.23   9.08  0.002578 ** 
I(hp + 7 * mpg)    1    28.61  34.61  18.46 1.737e-05 ***"
confint(Model5, test = "LRT")
"                         2.5 %      97.5 %
(Intercept)       -110.6178796 -3.67330617
mpg                 -4.6495293 -1.19694307
I(disp - 2 * mpg)   -0.1988085 -0.01323519
I(hp + 7 * mpg)      0.0483784  0.41193985"
drop1(Model5, test = "Rao")
"                  Df Deviance    AIC  Rao score  Pr(>Chi)    
<none>                  10.15  18.15                         
mpg                1   360.44 366.44 7.6925e+14 < 2.2e-16 ***
I(disp - 2 * mpg)  1    19.23  25.23 6.0000e+00   0.01767 *  
I(hp + 7 * mpg)    1    28.61  34.61 1.6000e+01  7.11e-05 ***"
confint(Model5, test = "Rao")
"                         2.5 %       97.5 %
(Intercept)       -83.44852761  0.199569732
mpg                -4.33719984 -1.628957742
I(disp - 2 * mpg)  -0.14645346 -0.004624354
I(hp + 7 * mpg)     0.02532632  0.300456929"

If the hypothesis involves coefficients of categorical variables, specifying the levels within the model formula might be necessary. For example, to test $H_0: \beta_1 + \beta_3 = 0$ in a model $\text{logit}(p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2$ where $x_2$ is binary, rearrange the original model into $\text{logit}(p) = \gamma_0 + \gamma_1 z_1 + \gamma_2 z_2 + \gamma_3 z_3$, where $z_1 = x_1$, $z_2 = x_2$, and $z_3 = x_1 x_2 - x_1$, so that $\gamma_0 = \beta_0, \gamma_1 = \beta_1 + \beta_3, \gamma_2 = \beta_2, \gamma_3 = \beta_3$ by design. Thus, the original linear combination amounts to testing $H_0: \gamma_1 = 0$.

# Test beta1 + beta3 = 0 with interaction
summary(Model6 <- glm(
  am ~ mpg * factor(vs), 
  family = binomial(), data = mtcars))
"                Estimate Std. Error z value Pr(>|z|)  
(Intercept)     -9.42876    4.43018  -2.128   0.0333 *
mpg              0.50789    0.25140   2.020   0.0434 *
factor(vs)1     -4.24443    8.77150  -0.484   0.6285  
mpg:factor(vs)1  0.07015    0.41543   0.169   0.8659  
    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 24.914  on 28  degrees of freedom
AIC: 32.914"
hypotheses(Model6, hypothesis = "b2 + b4 = 0") # Wald test
"        Term Estimate Std. Error    z Pr(>|z|)   S   2.5 % 97.5 %
 b2 + b4 = 0    0.578      0.331 1.75   0.0805 3.6 -0.0702   1.23"
summary(Model7 <- glm(
  am ~ mpg + factor(vs) + I(mpg * (vs == 1) - mpg), 
  family = binomial(), data = mtcars))
"                         Estimate Std. Error z value Pr(>|z|)  
(Intercept)              -9.42876    4.43018  -2.128   0.0333 *
mpg                       0.57804    0.33072   1.748   0.0805 .
factor(vs)1              -4.24443    8.77150  -0.484   0.6285  
I(mpg * (vs == 1) - mpg)  0.07015    0.41543   0.169   0.8659  
    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 24.914  on 28  degrees of freedom
AIC: 32.914"
drop1(Model7, test = "LRT") # LRT
"                         Df Deviance    AIC    LRT Pr(>Chi)   
<none>                        24.914 32.914                   
mpg                       1   34.872 40.872 9.9576 0.001602 **
factor(vs)                1   25.179 31.179 0.2651 0.606618   
I(mpg * (vs == 1) - mpg)  1   24.944 30.944 0.0295 0.863677 "
confint(Model7, test = "LRT") # LRT CI of the linear combination
"                               2.5 %    97.5 %
(Intercept)              -20.4926175 -2.562472
mpg                        0.1558230  1.637180
factor(vs)1              -29.2315384 11.039945
I(mpg * (vs == 1) - mpg)  -0.7074663  1.198649"
drop1(Model7, test = "Rao") # score test
"                         Df Deviance    AIC Rao score Pr(>Chi)   
<none>                        24.914 32.914                      
mpg                       1   34.872 40.872    7.5812 0.005898 **
factor(vs)                1   25.179 31.179    0.2466 0.619508   
I(mpg * (vs == 1) - mpg)  1   24.944 30.944    0.0287 0.865504"
confint(Model7, test = "Rao") # score test CI
"                                2.5 %     97.5 %
(Intercept)              -17.82195389 -1.7156080
mpg                        0.09745762  1.2163391
factor(vs)1              -20.08550283  9.6172422
I(mpg * (vs == 1) - mpg)  -0.61012491  0.8047485"

I do not know if it is theoretically possible or practically feasible to test nonlinear combinations of coefficients using the likelihood-ratio test and score test.

$\endgroup$
3
  • 4
    $\begingroup$ With Bayesian models this is all extremely easy. Given convergence of the posterior samples and taking several thousands of them (samples from the multivariate distribution of all model parameters) one merely computes the proportion of posterior draws such that the complex nonlinear contrast exceeds a specified value such as zero. This is not point hypothesis testing but is far more meaningful. $\endgroup$ Commented Jun 5 at 11:56
  • $\begingroup$ @FrankHarrell thanks for pointing out. Any suggestions on a good textbook with both theory and applications of Bayesian models? $\endgroup$
    – DrJerryTAO
    Commented Jun 6 at 2:55
  • $\begingroup$ Https://hbiostat.org/bayes , Richard McElreath’s Statistical Rethinking, and much more. But start with my list of resources in the hbiostat link. $\endgroup$ Commented Jun 6 at 12:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.