Skip to main content
Notice removed Authoritative reference needed by Javier Hernando
Bounty Ended with Shawn Hemelstrand's answer chosen by Javier Hernando
Removed thank you, added tag, fixed spelling errors and added formatting
Source Link
Shawn Hemelstrand
  • 15.9k
  • 6
  • 29
  • 79

Firstly, I would like to admit that even though it is not the first time I am working with linear mixed models, the mathematical foundations escape me. I am running a linear mixed-effects model using the nlme package, specifically the lmelme function. Firstly, I'm using this function instead of lme4lme4 because of my previous experience with it. As you know, dput()dput() of long-format databases is almost impossible, so I will describe the data in as much detail as possible (if necessary I will provide through some platform).

Regarding these linear combinations, I have some doubts that have arisen. At least hta_s19 and diab_s19 only have one category in the matrix created. I'm not sure if this is related with the additional category for unknown information codified as "9" which actually has 0 indvidualsindividuals on it (Note that the variable name has switched to hta_s19, I guess this is related to value "9")

  • If I continue dropping from the matrix the tagged columns, I would be discarding time:group interaction term, which is in fact the term I am looking for.
  • I don't know wetherwhether the problem is related to convergence or collinearity, or both of them (not sure this affirmation is nonsense)
  • In case I shoudshould discard columns and use the matrix, I wasn't able to follow this . I wouldn't know from the matrix of covariance how to reformulate the model due to the different notation of the variables

Thank you in advance!

Firstly, I would like to admit that even though it is not the first time I am working with linear mixed models, the mathematical foundations escape me. I am running a linear mixed-effects model using the nlme package, specifically the lme function. Firstly, I'm using this function instead of lme4 because of my previous experience with it. As you know, dput() of long-format databases is almost impossible, so I will describe the data in as much detail as possible (if necessary I will provide through some platform).

Regarding these linear combinations, I have some doubts that have arisen. At least hta_s19 and diab_s19 only have one category in the matrix created. I'm not sure if this is related with the additional category for unknown information codified as "9" which actually has 0 indviduals on it (Note that the variable name has switched to hta_s19, I guess this is related to value "9")

  • If I continue dropping from the matrix the tagged columns, I would be discarding time:group interaction term, which is in fact the term I am looking for.
  • I don't know wether the problem is related to convergence or collinearity, or both of them (not sure this affirmation is nonsense)
  • In case I shoud discard columns and use the matrix, I wasn't able to follow this . I wouldn't know from the matrix of covariance how to reformulate the model due to the different notation of the variables

Thank you in advance!

Firstly, I would like to admit that even though it is not the first time I am working with linear mixed models, the mathematical foundations escape me. I am running a linear mixed-effects model using the nlme package, specifically the lme function. Firstly, I'm using this function instead of lme4 because of my previous experience with it. As you know, dput() of long-format databases is almost impossible, so I will describe the data in as much detail as possible (if necessary I will provide through some platform).

Regarding these linear combinations, I have some doubts that have arisen. At least hta_s19 and diab_s19 only have one category in the matrix created. I'm not sure if this is related with the additional category for unknown information codified as "9" which actually has 0 individuals on it (Note that the variable name has switched to hta_s19, I guess this is related to value "9")

  • If I continue dropping from the matrix the tagged columns, I would be discarding time:group interaction term, which is in fact the term I am looking for.
  • I don't know whether the problem is related to convergence or collinearity, or both of them (not sure this affirmation is nonsense)
  • In case I should discard columns and use the matrix, I wasn't able to follow this . I wouldn't know from the matrix of covariance how to reformulate the model due to the different notation of the variables
Notice added Authoritative reference needed by Javier Hernando
Bounty Started worth 50 reputation by Javier Hernando
Source Link

Error in mixed-models. Which to detect? Collinearity? Singularity in backsolve at level 0, block 1

Firstly, I would like to admit that even though it is not the first time I am working with linear mixed models, the mathematical foundations escape me. I am running a linear mixed-effects model using the nlme package, specifically the lme function. Firstly, I'm using this function instead of lme4 because of my previous experience with it. As you know, dput() of long-format databases is almost impossible, so I will describe the data in as much detail as possible (if necessary I will provide through some platform).

I will try to explain step by step what I have done. I will work with an example, actually the first component of my model, which has yielded an error.

  1. The model: This model could be overfitted, and this is the first issue I am facing. However, the problem for me is how to identify this excess of covariates.

lme(cd86 ~ time:grupo_int_v00 + time + grupo_int_v00 + edad_s1 + 
    sexo_s1 + imc + hta_s1 + diab_s1 + dislip + geaf_tot, random= ~ 1|paciente, control = lmeControl(opt = "optim"), method="REML", data=tmp, na.action = na.omit)
  1. The data = tmp: Long-format data, with 2 temporal points, time = 0,1. There are multiple categorical variables in the model, two of which include an additional category for unknown information coded as value 9 (in diab_s1 and hta_s1).
tibble [8,874 × 13] (S3: tbl_df/tbl/data.frame)
 $ cd86         : num [1:8874] NA NA NA NA NA ...
 $ grupo_int_v00: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ time         : num [1:8874] 0 0 0 0 0 0 0 0 0 0 ...
 $ paciente     : num [1:8874] 6291 6291 6291 6291 6291 ...
 $ edad_s1      : num [1:8874] 68 68 68 68 68 68 68 68 68 68 ...
 $ sexo_s1      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ peso         : num [1:8874] 92.6 92.6 92.6 92.6 92.6 92.6 92.6 92.6 92.6 92.6 ...
 $ imc          : num [1:8874] 30.2 30.2 30.2 30.2 30.2 ...
 $ hta_s1       : Factor w/ 3 levels "0","1","9": 2 2 2 2 2 2 2 2 2 2 ...
 $ diab_s1      : Factor w/ 3 levels "0","1","9": 1 1 1 1 1 1 1 1 1 1 ...
 $ dislip       : num [1:8874] 1 1 1 1 1 1 1 1 1 1 ...
 $ geaf_tot     : num [1:8874] 280 280 280 280 280 ...
 $ alcoholg_v00 : num [1:8874] 31.5 31.5 31.5 31.5 31.5 ...

 sapply(vars_modelo, function(x) sum(!is.na(tmp[[x]])))
         cd86          time grupo_int_v00       edad_s1 
          169          8874          8874          8874 
      sexo_s1           imc        hta_s1       diab_s1 
         8874          8874          8874          8874 
       dislip      geaf_tot      paciente 
         8874          4437          8874 


  1. Multicollinearity? Following previous threads here by Ben Bolker, I went for creating the matrix and test the "linear combos"
X <- model.matrix(~ time:grupo_int_v00 + time + grupo_int_v00 + edad_s1 + sexo_s1 + imc + hta_s1 + diab_s1 + dislip + geaf_tot, data = tmp)

lc <- caret::findLinearCombos(X)

colnames(X)[lc$linearCombos[[1]]]
[1] "time"
> colnames(X)[lc$linearCombos[[2]]]
[1] "hta_s19"
> colnames(X)[lc$linearCombos[[3]]]
[1] "diab_s19"
> colnames(X)[lc$linearCombos[[4]]]
[1] "time:grupo_int_v001"

colnames(X)
 [1] "(Intercept)"         "time"               
 [3] "grupo_int_v001"      "edad_s1"            
 [5] "sexo_s11"            "imc"                
 [7] "hta_s11"             "hta_s19"            
 [9] "diab_s11"            "diab_s19"           
[11] "dislip"              "geaf_tot"           
[13] "time:grupo_int_v001

Regarding these linear combinations, I have some doubts that have arisen. At least hta_s19 and diab_s19 only have one category in the matrix created. I'm not sure if this is related with the additional category for unknown information codified as "9" which actually has 0 indviduals on it (Note that the variable name has switched to hta_s19, I guess this is related to value "9")

unique(X[,"hta_s19"])
[1] 0

table(tmp$hta_s1)

   0    1    9 
1122 7752    0 


unique(X[,"diab_s19"])
[1] 0


table(tmp$diab_s1)

   0    1    9 
5916 2958    0 

What I don't fully comprehend is the significance of the other two linear combinations (time and "time:grupo_int_v001"). Does it imply that only time = 0 is detected in the matrix, suggesting a specific occurrence at this time?

Additionally, considering the matrix construction's aim to detect collinearity, are these variables indeed collinear?

 unique(X[,"time"])
[1] 0

# Freq in the original databsase
table(tmp$time)

   0    1 
4437 4437

unique(X[,"time:grupo_int_v001"])
[1] 0

table(tmp$grupo_int_v00)

   0    1 
4386 4488

I would like to clarify within this volume of information.

  • If I continue dropping from the matrix the tagged columns, I would be discarding time:group interaction term, which is in fact the term I am looking for.
  • I don't know wether the problem is related to convergence or collinearity, or both of them (not sure this affirmation is nonsense)
  • In case I shoud discard columns and use the matrix, I wasn't able to follow this . I wouldn't know from the matrix of covariance how to reformulate the model due to the different notation of the variables
#variables to remove from the matrix

variables_discard <- c("time", "hta_s19", "diab_s19", "time:grupo_int_v001")


# Remove columns
X1 <- with(tmp,
           data.frame(cd86,
                      as.data.frame(X),
                      paciente))

X1 <- X1[, -which(colnames(X1) %in% variables_discard)]
X1 <- as.data.frame(X1)


lme(reformulate(colnames(X1)[2:15], response = "cd86"), random = ~1|paciente,control = lmeControl(opt = "optim"), method="REML", data=X1, na.action = na.omit)

#Error in terms.formula(formula, data = data) : 
  invalid model formula in ExtractVars

Thank you in advance!