3
$\begingroup$

I'm using emtrends to extract slopes and do pairwise comparisons between the groups of my independent variable, controlled for the other variables in the model. However, when plotting the slopes, I noticed all SE values produced by emtrends were identical. I can't work out why though?

Here's a worked example:

demo_data <- structure(list(responsevar = c(-0.0618190679498364, 0.15899745702364, 
0.929534141895136, 1.1177284975211, -0.00786079907502823, 0.094923852612621, 
0.607581364096467, 0.440737663183793, 0.082136198176222, 0.157661900467845, 
0.830998187171696, 1.48543531298402, 1.7023101500879, 2.05170334713859, 
0.0100560258999499, 1.17122182596676, 0.966616026418047, 1.28196801361743, 
1.83426518745315, -0.0618190679498364, 0.483300296887319, 0.891072080154904, 
0.242089446870316, -0.00786079907502823, 0.100605419887576, 0.506128950660782, 
0.209931589686233, 0.115366122953099, 0.157661900467845, 0.531293815992106, 
1.25374892200693, 1.50296621263127, 1.44204603838667, 0.0100560258999499, 
-0.100982522461554, 0.701793094621886, 0.957971325343055, 2.20514483544439, 
-0.0618190679498364, 0.356048307744416, 0.651509920934051, 0.108383785292986, 
-0.00786079907502823, 0.207654065049067, 0.295812434350708, 0.0367762200675729, 
-0.119369737876224, 0.157661900467845, 0.855810615090611, 0.794374313954266, 
0.714569129850953, 0.77324072979401, 0.0100560258999499, 0.141294856642278, 
0.429316434045948, 0.26206854485803, 0.418774008647674, -0.0618190679498364, 
-0.246924373738116, 0.319374236827093, -0.49671671929437, -0.00786079907502823, 
0.228222865853934, 0.0936004246573529, -0.385729213582175, -0.338446278126348, 
0.157661900467845, 0.437319684073364, 0.43789561209487, 0.269572832872745, 
0.143710919264518, 0.0100560258999499, 0.584417295260851, -1.00583616078814, 
0.0349028968964146, -0.138691000504007, -0.0618190679498364, 
0.475098387917471, 0.151489745992231, 0.606705131746448, -0.00786079907502823, 
0.25051692004141, 0.33430090269068, 0.182426365767506, 0.30696939648497, 
0.157661900467845, 0.794025180049588, 0.967192000312653, 1.52027190896946, 
1.89339167130825, 0.0100560258999499, -0.130451253242324, 1.12487260844998, 
2.53082516067062, 2.23414816378354), indepvar1 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), levels = c("unedited_vehicle", 
"unedited_MSH3aso_0.022uM", "unedited_MSH3aso_0.26uM", "unedited_MSH3aso_3uM", 
"unedited_SCRaso_3uM"), class = "factor"), 
indepvar2 = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), levels = c("MSH3aso_dose_titration-N1--unedited", 
"MSH3aso_dose_titration-N2--unedited", "MSH3aso_dose_titration-N3--unedited", 
"MSH3aso_dose_titration-N4--unedited", "FAN1ko_MSH3aso-N1-2H3-FAN1ko", 
"FAN1ko_MSH3aso-N2-2H3-FAN1ko", "FAN1ko_MSH3aso-N4-2H3-FAN1ko", 
"FAN1ko_MSH3aso-N3-2H3-FAN1ko", "CRISPRwt_MSH3aso-N1--CRISPRwt", 
"CRISPRwt_MSH3aso-N2--CRISPRwt", "CRISPRwt_MSH3aso-N3--CRISPRwt", 
"MSH3ko_CRISPRwt-NI1908-Cl37-MSH3ko", "MSH3ko_CRISPRwt-NI1908-Cl27-MSH3ko", 
"MSH3ko_CRISPRwt-NI1708-Cl37 -MSH3ko", "MSH3ko_CRISPRwt-NI0408-WT_JH-unedited", 
"MSH3ko_CRISPRwt-NI1708-Cl26-MSH3ko", "MSH3ko_CRISPRwt-NI1808-Cl26-MSH3ko"
), class = "factor"), time = c(0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15, 0, 3, 6, 9, 0, 3, 6, 12, 15, 0, 
3, 9, 12, 15, 0, 3, 9, 12, 15)), row.names = c(NA, -95L), 
class = c("tbl_df", 
"tbl", "data.frame"))
my_lm <- lm(responsevar ~ indepvar1 * time + indepvar2, 
            data = demo_data)
emtrends(my_lm, ~ indepvar1, var = "time")

> emtrends(my_lm, ~ indepvar1, var = "time")
 indepvar1                time.trend     SE df lower.CL upper.CL
 unedited_vehicle             0.0801 0.0168 82   0.0467   0.1134
 unedited_MSH3aso_0.022uM     0.0741 0.0168 82   0.0407   0.1074
 unedited_MSH3aso_0.26uM      0.0120 0.0168 82  -0.0213   0.0454
 unedited_MSH3aso_3uM        -0.0232 0.0168 82  -0.0565   0.0101
 unedited_SCRaso_3uM          0.0994 0.0168 82   0.0661   0.1327

Results are averaged over the levels of: indepvar2 
Confidence level used: 0.95 

As you can see, SE is 0.0168 for all groups in indepvar1.

Google, tried AI, can't find a solution

$\endgroup$
1

1 Answer 1

3
$\begingroup$

The standard errors for the indepvar1 time trends are the same because the design is balanced with respect to the indepvar1:time product terms. The balance manifests as patterns of symmetry in the design matrix $\mathbf{X}$ and those are also reflected in the variance covariance matrix of the regression coefficients, $\operatorname{Var}(\boldsymbol{\beta}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}$. See also How are the standard errors of coefficients calculated in a regression?.

demo_data |>
  count(indepvar1, time) |>
  pivot_wider(
    names_from = time,
    values_from = n
  )
#> indepvar1                   `0`   `3`   `6`   `9`  `12`  `15`
#> unedited_vehicle             4     4     2     3     3     3
#> unedited_MSH3aso_0.022uM     4     4     2     3     3     3
#> unedited_MSH3aso_0.26uM      4     4     2     3     3     3
#> unedited_MSH3aso_3uM         4     4     2     3     3     3
#> unedited_SCRaso_3uM          4     4     2     3     3     3

Notice how each indepvar1 category has the same number of observations per time point, even though that number varies across time points. So each indepvar1 category has a block of rows in $\mathbf{X}$ with the same structure, which creates balance/symmetry in $\mathbf{X}$.

This is the main reason the trend std. errors are the same. It also matters that there is balance between indepvar1:time and indepvar2.

demo_data |>
  count(indepvar1, time, indepvar2) |>
  pivot_wider(
    names_from = time,
    values_from = n
  )
#> indepvar1                indepvar2          `0`   `3`   `6`   `9`  `12`  `15`
#> unedited_vehicle         MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_vehicle         MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_vehicle         MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_vehicle         MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.022uM MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_0.26uM  MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_MSH3aso_3uM     MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1     1     1    NA    NA
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1     1    NA     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1    NA     1     1     1
#> unedited_SCRaso_3uM      MSH3aso_dose_ti…    1     1    NA     1     1     1

Notice how the pattern of NAs repeats 5 times (once for each indepvar1 category) with one observation per combination otherwise. A NA indicates there is no observation for the corresponding combination of the three predictors.

An easy way to "break" the balance between indepvar1:time and indepvar2 is to shuffle indepvar2. Effectively this redistributes the NAs in an imbalanced way.

set.seed(123)

demo_data <- demo_data |>
  mutate(
    indepvar2 = sample(indepvar2, replace = FALSE)
  )

my_lm <- lm(
  responsevar ~ indepvar1 * time + indepvar2,
  data = demo_data
)

emtrends(my_lm, ~indepvar1, var = "time")
#>  indepvar1                time.trend     SE df lower.CL upper.CL
#>  unedited_vehicle             0.0848 0.0212 82   0.0426   0.1271
#>  unedited_MSH3aso_0.022uM     0.0782 0.0208 82   0.0368   0.1196
#>  unedited_MSH3aso_0.26uM      0.0220 0.0214 82  -0.0205   0.0646
#>  unedited_MSH3aso_3uM        -0.0206 0.0205 82  -0.0613   0.0201
#>  unedited_SCRaso_3uM          0.1062 0.0221 82   0.0623   0.1501
#> 
#> Results are averaged over the levels of: indepvar2 
#> Confidence level used: 0.95

Now the std. errors of the time trends vary a bit between categories. Adding a continuous covariate would (most likely) have a similar effect; it's harder to get exact balance with continuous variables.

$\endgroup$
2
  • $\begingroup$ Thanks for the thorough answer, though shuffling the one of the main variables in my data will presumably affect the results, including the trend estimate. Is there a way of getting the correct SE without changing the data? Thanks! $\endgroup$
    – Mike
    Commented Jul 4 at 15:53
  • $\begingroup$ Not sure what you mean. You are already getting the correct std. errors. My answer explains why the correct std. errors (for your dataset) are all equal. $\endgroup$
    – dipetkov
    Commented Jul 4 at 16:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.