2
$\begingroup$

I built a GAM with two categorical variables and two smooth terms, following this structure:

model <- ik ~ population_id_cat + s_status + s(n_locs, bs = "re") + s(animal_id, bs = "re")

The model summary for the parametric coefficients is:

Parametric coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      1.90509    0.07798  24.432  < 2e-16 ***
population_id_catB              -1.08790    0.13199  -8.242 2.27e-16 ***
population_id_catC               0.06718    0.08880   0.757 0.449368    
population_id_catD              -0.27599    0.07689  -3.589 0.000336 ***
population_id_catE              -0.33914    0.13555  -2.502 0.012391 *  
population_id_catF              -0.71104    0.07586  -9.373  < 2e-16 ***
population_id_catG              -0.39243    0.07276  -5.393 7.32e-08 ***
population_id_catH               0.31115    0.18431   1.688 0.091449 .  
population_id_catI               0.06530    0.10203   0.640 0.522174    
s_status_2a_m                   -0.16331    0.05263  -3.103 0.001930 ** 
s_status_2fam                   -0.36656    0.05200  -7.050 2.10e-12 ***

When I plot these coefficients with the mgcViz::pterm() function, the values do not seem to match the std. error (but their significance does match with the p-value provided).

For example, the plot for the parametric coefficients mentioned above:

gviz_model <- getViz(model)
gviz_model_1 <- pterm(gviz_model, 1)
plot(gviz_model_1) + l_ciBar(colour = "blue") + l_fitPoints(colour = "red") + l_rug(alpha = 0.3)

enter image description here

Question 1:

What values are actually plotted in this plot? They seem to match the "Estimate" value from the summary, but they do not match the "Std. Error" (e.g., look at population H, according to the std error, the o value would be significant and therefore would be above 0).

If I plot the Estimate+-Std. Error in a ggplot (which is my goal, because I want to plot these parametric values for two models in the same plot), I get this instead:

gviz_model_pop <- termplot(gviz_model, se = TRUE, plot = FALSE)$population_id_cat
ggplot(gviz_model_pop , aes(x=x, y=y)) + 
     geom_errorbar(aes(ymin=y-se, ymax=y+se), width=.15, size=1, position=position_dodge(0.2))  +
     geom_point(position=position_dodge(0.2), size=3.5, shape=16)

enter image description here

Similar trends, but the interval is definitely not the same (and influences the significance of each category).

Question 2:

If I want to plot these parametric coefficients for two models in the same plot, and I would need to use the output from pterm(), instead of extracting the estimates/std error as I did previously, how can I do this? Preferably with a ggplot output.

Question 3:

If I want to plot these parametric coefficients including the intercept (to make the plot more interpretable), as mentioned here in "Transformed standard errors (2)", how do I do it? Do I need separate calculations or is there a function for the parametric coefficients too? Preferably with a ggplot output.

Thank you very much in advance for any help!

$\endgroup$

1 Answer 1

3
$\begingroup$

Q1

The blue bars are confidence intervals formed (likely) as +/- 2 * Std. err

Q2

I think this would be easier with parametric_effects() from my {gratia} package; what pterm() seems to be doing is wrapping up the model object with some extra information, which doesn't seem to be the information computed for the plot.

Using the example from ?pterm, we have

library("gratia")

set.seed(3)
dat <- gamSim(1,n=1500,dist="normal",scale=20)
dat$fac <- as.factor( sample(c("A1", "A2", "A3"), nrow(dat), replace = TRUE) )
dat$logi <- as.logical( sample(c(TRUE, FALSE), nrow(dat), replace = TRUE) )
bs <- "cr"; k <- 12
b <- gam(y ~ x0 + x1 + I(x1^2) + s(x2,bs=bs,k=k) + fac + x3:fac + I(x1*x2) + logi,
         data=dat)

p <- parametric_effects(b)

Which gives

> p <- parametric_effects(b)                                                  
Interaction terms are not currently supported.
> p                                                                           
# A tibble: 6,005 × 6
   term  type       value level partial    se
   <chr> <chr>   <I<dbl>> <fct>   <dbl> <dbl>
 1 x0    numeric    0.168 NA     -0.310 0.309
 2 x0    numeric    0.808 NA     -1.49  1.48 
 3 x0    numeric    0.385 NA     -0.711 0.708
 4 x0    numeric    0.328 NA     -0.605 0.602
 5 x0    numeric    0.602 NA     -1.11  1.11 
 6 x0    numeric    0.604 NA     -1.12  1.11 
 7 x0    numeric    0.125 NA     -0.230 0.229
 8 x0    numeric    0.295 NA     -0.544 0.541
 9 x0    numeric    0.578 NA     -1.07  1.06 
10 x0    numeric    0.631 NA     -1.17  1.16 
# ℹ 5,995 more rows
# ℹ Use `print(n = ...)` to see more rows

The confidence is easy to add:

p <- p |>
  dplyr::mutate(lower_ci = partial - (1.96 * se), upper_ci = partial + (1.96 * se))

I would also append a model column to indicate which model this is:

p <- p |>
  dplyr::mutate(model = rep("Model 1", nrow(p))

If you repeat that for the other model, to create p2 say, you should be able to bind the two objects together with

p_both <- dplyr::bind_rows(p, p2)

You can also select which terms you want to extract the info for using the terms argument to parametric_effects().

Q3

The best way to do this is to just predict from the model including only the constant term and parametric term of you choice. In the example from ?pterm that I used above, there is a factor variable fac which we'll use to illustrate the point.

# get a data slice where only `fac` varies
# hold all other variables are representative values
ds <- data_slice(b, fac = evenly(fac)) |>
  dplyr::mutate(logi = as.logical(logi)) # <-- work around bug

# use fitted_values() to predict from the model using only the effects
# of `fac` and the constant term
fv <- fitted_values(b, data = ds, terms = c("(Intercept)", "fac"))
fv

There's a bug in data_slice() currently; data_slice() seems to be converting logical variables to numeric. We work around that by coaxing them back to logical. And we only need this to keep predict.gam() happy; none of the other variables but fac will be used in the predictions we computed in fv.

Having done the above fv is:

> fv                                                                          
# A tibble: 3 × 10
  fac      x0    x1    x2 logi     x3 fitted    se  lower upper
  <fct> <dbl> <dbl> <dbl> <lgl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
1 A1    0.511 0.521 0.493 FALSE 0.476   5.75  2.55  0.753 10.8 
2 A2    0.511 0.521 0.493 FALSE 0.476   4.60  2.54 -0.367  9.57
3 A3    0.511 0.521 0.493 FALSE 0.476   2.94  2.63 -2.21   8.09

Here, as fac only had three levels, we get the three values needed for the plot. The fitted column is now has the intercept added to each of the parametric effects, hence A1, being the reference level has the same value as the intercept:

> coef(b)[1]                                                                  
(Intercept) 
   5.753834

while the other values in fitted are (Intercept) plus the respective coefficient for the other levels. The lower and upper columns contain the confidence interval as it would be plotted in the output from mgcViz. Note that fitted_values() returns, by default, predicted values on the response scale. If you have a non-Gaussian family, you can choose to keep the values on the link scale using scale = "link" in the call to fitted_values()

$\endgroup$
8
  • 1
    $\begingroup$ Yes, fv contains fitted values from the model on the response scale (given the code shown). The interpretation isn't that these estimated effects are independent of the other terms. Technically they are for the reference level of logi (FALSE) because that is what the constant term represents A1 and FALSE. You would get slightly different answers if you did this for logi == TRUE, but importantly (apart from differences in the Std err of the estimated effects) each of the estimates for A1 is being shifted by the same value (intercept) so relative to one another the effects 1/2 $\endgroup$ Commented May 11, 2023 at 8:11
  • 1
    $\begingroup$ ...are comparable. The isn't much you can do to avoid this; you either set logi == FALSE (which is implied when you use only the intercept) or you set logi = TRUE and you include the effect of logi in the terms argument. There are other things you could do, such as generate estimates given both values of logi and average those estimates; you difference those estimates, etc. Doing all that is not something that is easy to do with {gratia} (there are no canned functions to do those steps) but the {emmeans} or {marginaleffects} packages can do this and would be my goto for this 2/2 $\endgroup$ Commented May 11, 2023 at 8:15
  • 1
    $\begingroup$ Also, the estimates will never be independent of the logi effect (or any other estimated effect); the coefficients associated with fac (in the example) are those effects given that logi and the other terms are included in the model. If you dropped logi from the model, the estimated effects of fac would change. $\endgroup$ Commented May 11, 2023 at 8:16
  • 1
    $\begingroup$ With random effects, it depends on what you are trying to do/show. Typically, when you are showing the effects of the, or selected, "fixed" terms (the scare quotes because smooths are random effects too but you wouldn't treat them as being a random drawn), you would exclude the random effects or average over them. The former is easy (just add the random effect to the exclude argument, or as I did here just don't include that term in the terms argument), the latter not so much when a non-identity link function is being used. $\endgroup$ Commented May 11, 2023 at 9:16
  • 1
    $\begingroup$ Regardless, the code shown would provide a value for the random effect(s) when creating ds and then ignore the contribution from the random effect term when creating fv. I'm not sure what you mean by "change the logi parameter", do you mean specify that you want both value TRUE and FALSE in the data slice/predicted values? If that, you should be able to do data_slice(b, fac = evenly(fac), logi = c(FALSE, TRUE)) and you can skip the mutate() step as the bug doesn't affect that particular call. Now ds and hence fv will have 6 rows for the combinations of fac and logi. $\endgroup$ Commented May 11, 2023 at 9:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.