Identical SE values for different groups using emtrends

Question

I'm using emtrends to extract slopes and do pairwise comparisons between the groups of my independent variable, controlled for the other variables in the model. However, when plotting the slopes, I noticed all SE values produced by emtrends were identical. I can't work out why though?

Here's a worked example:

demo_data <- structure(list(responsevar = c(-0.0618190679498364, 0.15899745702364, 
0.929534141895136, 1.1177284975211, -0.00786079907502823, 0.094923852612621, 
0.607581364096467, 0.440737663183793, 0.082136198176222, 0.157661900467845, 
0.830998187171696, 1.48543531298402, 1.7023101500879, 2.05170334713859, 
0.0100560258999499, 1.17122182596676, 0.966616026418047, 1.28196801361743, 
1.83426518745315, -0.0618190679498364, 0.483300296887319, 0.891072080154904, 
0.242089446870316, -0.00786079907502823, 0.100605419887576, 0.506128950660782, 
0.209931589686233, 0.115366122953099, 0.157661900467845, 0.531293815992106, 
1.25374892200693, 1.50296621263127, 1.44204603838667, 0.0100560258999499, 
-0.100982522461554, 0.701793094621886, 0.957971325343055, 2.20514483544439, 
-0.0618190679498364, 0.356048307744416, 0.651509920934051, 0.108383785292986, 
-0.00786079907502823, 0.207654065049067, 0.295812434350708, 0.0367762200675729, 
-0.119369737876224, 0.157661900467845, 0.855810615090611, 0.794374313954266, 
0.714569129850953, 0.77324072979401, 0.0100560258999499, 0.141294856642278, 
0.429316434045948, 0.26206854485803, 0.418774008647674, -0.0618190679498364, 
-0.246924373738116, 0.319374236827093, -0.49671671929437, -0.00786079907502823, 
0.228222865853934, 0.0936004246573529, -0.385729213582175, -0.338446278126348, 
0.157661900467845, 0.437319684073364, 0.43789561209487, 0.269572832872745, 
0.143710919264518, 0.0100560258999499, 0.584417295260851, -1.00583616078814, 
0.0349028968964146, -0.138691000504007, -0.0618190679498364, 
0.475098387917471, 0.151489745992231, 0.606705131746448, -0.00786079907502823, 
0.25051692004141, 0.33430090269068, 0.182426365767506, 0.30696939648497, 
0.157661900467845, 0.794025180049588, 0.967192000312653, 1.52027190896946, 
1.89339167130825, 0.0100560258999499, -0.130451253242324, 1.12487260844998, 
2.53082516067062, 2.23414816378354), indepvar1 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), levels = c("unedited_vehicle", 
"unedited_MSH3aso_0.022uM", "unedited_MSH3aso_0.26uM", "unedited_MSH3aso_3uM", 
"unedited_SCRaso_3uM"), class = "factor"), 
indepvar2 = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), levels = c("MSH3aso_dose_titration-N1--unedited", 
"MSH3aso_dose_titration-N2--unedited", "MSH3aso_dose_titration-N3--unedited", 
"MSH3aso_dose_titration-N4--unedited", "FAN1ko_MSH3aso-N1-2H3-FAN1ko", 
"FAN1ko_MSH3aso-N2-2H3-FAN1ko", "FAN1ko_MSH3aso-N4-2H3-FAN1ko", 
"FAN1ko_MSH3aso-N3-2H3-FAN1ko", "CRISPRwt_MSH3aso-N1--CRISPRwt", 
"CRISPRwt_MSH3aso-N2--CRISPRwt", "CRISPRwt_MSH3aso-N3--CRISPRwt", 
"MSH3ko_CRISPRwt-NI1908-Cl37-MSH3ko", "MSH3ko_CRISPRwt-NI1908-Cl27-MSH3ko", 
"MSH3ko_CRISPRwt-NI1708-Cl37 -MSH3ko", "MSH3ko_CRISPRwt-NI0408-WT_JH-unedited", 
"MSH3ko_CRISPRwt-NI1708-Cl26-MSH3ko", "MSH3ko_CRISPRwt-NI1808-Cl26-MSH3ko"
), class = "factor"), time = c(0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15)), row.names = c(NA, -95L), 
class = c("tbl_df", 
"tbl", "data.frame"))

my_lm <- lm(responsevar ~ indepvar1 * time + indepvar2, 
            data = demo_data)
emtrends(my_lm, ~ indepvar1, var = "time")

> emtrends(my_lm, ~ indepvar1, var = "time")
 indepvar1                time.trend     SE df lower.CL upper.CL
 unedited_vehicle             0.0801 0.0168 82   0.0467   0.1134
 unedited_MSH3aso_0.022uM     0.0741 0.0168 82   0.0407   0.1074
 unedited_MSH3aso_0.26uM      0.0120 0.0168 82  -0.0213   0.0454
 unedited_MSH3aso_3uM        -0.0232 0.0168 82  -0.0565   0.0101
 unedited_SCRaso_3uM          0.0994 0.0168 82   0.0661   0.1327

Results are averaged over the levels of: indepvar2 
Confidence level used: 0.95

As you can see, SE is 0.0168 for all groups in indepvar1.

Google, tried AI, can't find a solution

$\begingroup$ related: stats.stackexchange.com/questions/606074/… $\endgroup$
– Zhanxiong
Commented May 19 at 0:36 — Zhanxiong, Commented May 19 at 0:36

dipetkov · Accepted Answer · 2024-05-19 09:15:08Z

The standard errors for the indepvar1 time trends are the same because the design is balanced with respect to the indepvar1:time product terms. The balance manifests as patterns of symmetry in the design matrix $\mathbf{X}$ and those are also reflected in the variance covariance matrix of the regression coefficients, $\operatorname{Var}(\boldsymbol{\beta}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}$. See also How are the standard errors of coefficients calculated in a regression?.

demo_data |>
  count(indepvar1, time) |>
  pivot_wider(
    names_from = time,
    values_from = n
  )
#> indepvar1                   `0`   `3`   `6`   `9`  `12`  `15`
#> unedited_vehicle             4     4     2     3     3     3
#> unedited_MSH3aso_0.022uM     4     4     2     3     3     3
#> unedited_MSH3aso_0.26uM      4     4     2     3     3     3
#> unedited_MSH3aso_3uM         4     4     2     3     3     3
#> unedited_SCRaso_3uM          4     4     2     3     3     3

Notice how each indepvar1 category has the same number of observations per time point, even though that number varies across time points. So each indepvar1 category has a block of rows in $\mathbf{X}$ with the same structure, which creates balance/symmetry in $\mathbf{X}$.

This is the main reason the trend std. errors are the same. It also matters that there is balance between indepvar1:time and indepvar2.

demo_data |>
  count(indepvar1, time, indepvar2) |>
  pivot_wider(
    names_from = time,
    values_from = n
  )
#> indepvar1                indepvar2          `0`   `3`   `6`   `9`  `12`  `15`
#> unedited_vehicle         MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_vehicle         MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_vehicle         MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_vehicle         MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1    NA     1     1     1

Notice how the pattern of NAs repeats 5 times (once for each indepvar1 category) with one observation per combination otherwise. A NA indicates there is no observation for the corresponding combination of the three predictors.

An easy way to "break" the balance between indepvar1:time and indepvar2 is to shuffle indepvar2. Effectively this redistributes the NAs in an imbalanced way.

set.seed(123)

demo_data <- demo_data |>
  mutate(
    indepvar2 = sample(indepvar2, replace = FALSE)
  )

my_lm <- lm(
  responsevar ~ indepvar1 * time + indepvar2,
  data = demo_data
)

emtrends(my_lm, ~indepvar1, var = "time")
#>  indepvar1                time.trend     SE df lower.CL upper.CL
#>  unedited_vehicle             0.0848 0.0212 82   0.0426   0.1271
#>  unedited_MSH3aso_0.022uM     0.0782 0.0208 82   0.0368   0.1196
#>  unedited_MSH3aso_0.26uM      0.0220 0.0214 82  -0.0205   0.0646
#>  unedited_MSH3aso_3uM        -0.0206 0.0205 82  -0.0613   0.0201
#>  unedited_SCRaso_3uM          0.1062 0.0221 82   0.0623   0.1501
#> 
#> Results are averaged over the levels of: indepvar2 
#> Confidence level used: 0.95

Now the std. errors of the time trends vary a bit between categories. Adding a continuous covariate would (most likely) have a similar effect; it's harder to get exact balance with continuous variables.

Thanks for the thorough answer, though shuffling the one of the main variables in my data will presumably affect the results, including the trend estimate. Is there a way of getting the correct SE without changing the data? Thanks! — Mike, Commented Jul 4 at 15:53
Not sure what you mean. You are already getting the correct std. errors. My answer explains why the correct std. errors (for your dataset) are all equal. — dipetkov, Commented Jul 4 at 16:21

Stack Exchange Network

Identical SE values for different groups using emtrends

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regression
r
multiple-comparisons
or ask your own question.

Linked

Hot Network Questions

Identical SE values for different groups using emtrends

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regressionrmultiple-comparisons or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
r
multiple-comparisons
or ask your own question.