5
$\begingroup$

I am performing an analysis on data in which there is a hypothesized relationship between X (predictor) and Y (the response) across a broad area, but random (no relationship) within groups of observation due to local mixing in the predictor variable. An interesting analogy could be bowls of chowder in a dining hall, where each bowl has a different number of scallops varying between 10 and 100 (big bowls). A bowl of chowder contains an unknowable number of scallops (really big bowls).

It is a thick chowder, so you can only "sample" it with a large spoon that will take a number of scallops proportional to the total number of scallops in that bowl, on average. You could also taste the chowder on the spoon - with a higher density of scallops leading to a stronger taste. Also, the taste is mixed evenly throughout the bowl. We want to determine the relationship between the number of scallops on a spoon and the taste of the chowder on the spoon (we assume that the taste is determined by the bowl, not the scallops on the spoon). We take several spoons from each bowl (ignore depletion - big bowls). We would like to apply a mixed-model with taste intensity as the predictor (OK flavour concentration), the number of scallops on the spoon the response, and the bowl as the grouping variable for the random effect. Ideally, we would like to define the relationship between taste and number across the whole dining hall. We could easily pool the tastes and counts from each bowl and get the overall relationship - but this does not account for the variation in counts within the bowls. When applying a mixed model the random effect appears to account for all of the count variation and the predictor (taste) is not significant. Is there a better way than "pooling" to get at this relationship within a mixed-model framework?

A similar question was asked before (What happens when fixed and random effects overlap?) but not answered. I have included a simple simulation to illustrate the issue at the end of this note (sadly I have not included any chowder). Thank you for your considerations.

Cheers, Darren

#
# Fixed and Random effects competition for variance
#
library(mgcv)
# Simulate some data with 8 groups
slope = 1
sY = 1      # variability within grps
sYbar = 0.2  # variability of grp relationship

# The underlying relationship
grpX <- c(1,1,2,2,4,4,6,6)          # mean X per group
grpYbar <- slope*grpX + rnorm(length(grpX))*sYbar
plot(grpX,grpYbar)
cor(grpX,grpYbar)^2

# Perform 500 simulations
#
# Simulate 5 responses per group
grp <- c(1,2,3,4,5,6,7,8)
replicate <- c(1,2,3,4,5)
# Values to save in vectors
Xp <- as.numeric(NULL)
Sp <- as.numeric(NULL)
Bx <- as.numeric(NULL)
#
for (dups in 1:500) {
#
simData <- as.data.frame(list(grp=as.numeric(NULL),
                              replicate=as.numeric(NULL),
                              X=as.numeric(NULL),
                              Y=as.numeric(NULL)))
for (i in grp) {
  for (j in replicate) {
    g <- grp[i]
    r <- replicate[j]
    X <- grpX[i] + rnorm(1)
    Y <- grpYbar[i] + rnorm(1)*sY
    simData <- rbind(simData,list(grp=g,
                                  replicate=r,
                                  X=X,
                                  Y=Y))
  }
}
#
s1 <- gam(Y ~ X + s(grp, bs = 're'),
          data=simData,method="REML")
#
Xp <- c(Xp,summary(s1)$p.table[2,4])
Sp <- c(Sp,summary(s1)$s.table[1,4])
Bx <- c(Bx,summary(s1)$p.table[2,1])
#
}

# Plot significances
par(mfrow=c(2,2))
hist(Xp,main = "Slope p-value")
abline(v=0.05,lty=2,col="red")
hist(Sp,main = "Random Effects p-value")
abline(v=0.05,lty=2,col="red")
plot(Xp,Sp)
hist(Bx)

# Significant slope
sum(Xp<0.05)
# significant RE
sum(Sp<0.05)
# Significant slope(row) by RE(col)
Xsig <- Xp<0.05
Ssig <- Sp<0.05
sigTableData <- as.data.frame(list(Xsig=Xsig,
                                   Ssig=Ssig))
table(sigTableData)
$\endgroup$
3
  • $\begingroup$ "We want to determine the relationship between the number of scallops on a spoon and the taste of the chowder on the spoon (we assume that the taste is determined by the bowl, not the scallops on the spoon)" The comment between brackets confuses me. You want to determine a relationship between the spoon scallops and taste, but there is no relationship? $\endgroup$ Commented Jun 12 at 10:51
  • $\begingroup$ Later in the text it is suddenly "we would like to define the relationship between taste and number across the whole dining hall." So what is it? A relationship for the spoons, the bowls or the dining hall? $\endgroup$ Commented Jun 12 at 10:52
  • $\begingroup$ Sorry about the confusion. The spoon is the sampling device, and the bowl is the environment sampled. The could be a relationship between the taste and expected number in a bowl, but a different relationship between the taste and the expected number across all of the bowls in the dining hall - the ecological fallacy aspect of the example. Genser et al. 2015 (see below) does a better job of explaining. Briefly, we want to test for a relationship between scallop count and taste at the spoon both within a bowl (where true slope=0 ) and among bowls in the dining hall (where true slope>0). $\endgroup$
    – Darren G
    Commented Jun 13 at 19:23

2 Answers 2

3
$\begingroup$

Usually in mixed effects modelling a factor can enter the model as either fixed or random, and there are lots of threads here on CrossValidated about that. For example:

What is the difference between fixed effect, random effect in mixed effect models?
R's lmer cheat sheet
How can I test whether a random effect is significant?

However,there can be occasions where we model a factor as fixed and random.

In their seminal book, "Mixed-Effects Models in S and S-PLUS", along with the nlme package, Pinheiro, and Bates (2000) explore this with the Oats dataset (available from the nlme package)

Yield in bushels/acre of three different varieties of oats at four different concentrations of nitrogen (hundred weight/acre). The experimental units were arranged into six blocks, each with three whole-plots subdivided into four subplots. One variety of oats was used in each whole-plot with all four concentrations of nitrogen, one concentration in each of the four subplots. The panels correspond to the blocks.

They fit the variety factor, as both a fixed effect and a random effect (where variety is nested within Block, in a split-plot design.

yield ~ nitro * Variety, data = Oats, random = ~ 1 Block/Variety

After the model is introduced they then go on to talk about the Variety factor:

In this model there is a random effect for Variety %in% Block as well as a fixed effect for Variety. These terms model different characteristics of the response. The random effects term, as a nested random effect, is allowing for different intercepts at the level of plots within blocks. The fact that each plot is planted with one variety means that we can use the Variety factor to indicate the plot as long as we have Variety nested within Block. As seen in Figure 1.20 the yields in one of the plots within a block may be greater than those on another plot in the same block for all levels of nitro. For example, in block III the plot that was planted with the Marvellous variety had greater yields than the other two plots at each level of nitro. The random effect at the level of Variety %in% Block allows shifts like this that may be related to the fertility of the soil in that plot, for example.

On the other hand, the fixed-effects term for Variety is used to model a systematic difference in the yields that would be due to the variety of oats planted in the plot.

In this particular example, since the p-values for the fixed effect of Variety were large, they ended up removing the fixed effect in order to achieve a more parsimonious model. However I find their discussion above to be highly illustrative of the possibility of fitting a factor as fixed and randon

$\endgroup$
2
  • $\begingroup$ The sentence "each plot is planted with one variety means that we can use the Variety factor to indicate the plot" made me wonder: If I create a Plot column and fit the model yield ~ nitro * Variety + (1 | Block) + (1 | Plot), then I get the same model. (Only the random effects have different names). Now I'm not sure that this is a really convincing example of an effect being both fixed and random. $\endgroup$
    – dipetkov
    Commented Jun 16 at 15:54
  • $\begingroup$ @dipetkov that's an interesting insight - thanks for that :) $\endgroup$ Commented Jun 16 at 18:09
1
$\begingroup$

As I originally described it, a nested model as presented by Robert Long would be perfect. Thank you for presenting it so clearly and in such detail.

With a little more reading and remembering, I see that this physical example is a case of the ecological fallacy (Gelman and Park 2001 https://doi.org/10.1111/1467-985X.00190). The number of scallops is related to the taste with a slope of zero within the bowls and a positive slope among the means of the bowls. Thus, the relationship between spoonfuls and taste differs between individuals (within bowls) and across the dining hall (among bowls).

A good example of a model to deal with cases like this can be found in the Appendix of Genser et al. 2015 (https://doi.org/10.1186/s12940-015-0047-2). Their model could be presented as:

Scallops ~ Taste_bowl + I(Taste_spoon - Taste_bowl) + (1|bowl)

where Taste_bowl is the average taste in the bowl, and I(Taste_spoon - Taste_bowl) is the centered (to avoid collinearity) Taste intensity within the bowl. The random effects (1|bowl) would allow for intercept variations among bowls. I(Taste_spoon - Taste_bowl) would allow for variation within the bowls in case the taste was not evenly mixed. This model goes beyond my example, but is a bit more flexible for relationships within the bowl. This would be an ecological regression or "Within- and between-group regression" in the sense described in my citations.

$\endgroup$
2
  • 2
    $\begingroup$ As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. $\endgroup$
    – Community Bot
    Commented Jun 9 at 3:06
  • 1
    $\begingroup$ Rephrased and added citations to improve clarity. $\endgroup$
    – Darren G
    Commented Jun 10 at 18:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.