I am performing an analysis on data in which there is a hypothesized relationship between X (predictor) and Y (the response) across a broad area, but random (no relationship) within groups of observation due to local mixing in the predictor variable. An interesting analogy could be bowls of chowder in a dining hall, where each bowl has a different number of scallops varying between 10 and 100 (big bowls). A bowl of chowder contains an unknowable number of scallops (really big bowls).
It is a thick chowder, so you can only "sample" it with a large spoon that will take a number of scallops proportional to the total number of scallops in that bowl, on average. You could also taste the chowder on the spoon - with a higher density of scallops leading to a stronger taste. Also, the taste is mixed evenly throughout the bowl. We want to determine the relationship between the number of scallops on a spoon and the taste of the chowder on the spoon (we assume that the taste is determined by the bowl, not the scallops on the spoon). We take several spoons from each bowl (ignore depletion - big bowls). We would like to apply a mixed-model with taste intensity as the predictor (OK flavour concentration), the number of scallops on the spoon the response, and the bowl as the grouping variable for the random effect. Ideally, we would like to define the relationship between taste and number across the whole dining hall. We could easily pool the tastes and counts from each bowl and get the overall relationship - but this does not account for the variation in counts within the bowls. When applying a mixed model the random effect appears to account for all of the count variation and the predictor (taste) is not significant. Is there a better way than "pooling" to get at this relationship within a mixed-model framework?
A similar question was asked before (What happens when fixed and random effects overlap?) but not answered. I have included a simple simulation to illustrate the issue at the end of this note (sadly I have not included any chowder). Thank you for your considerations.
Cheers, Darren
#
# Fixed and Random effects competition for variance
#
library(mgcv)
# Simulate some data with 8 groups
slope = 1
sY = 1 # variability within grps
sYbar = 0.2 # variability of grp relationship
# The underlying relationship
grpX <- c(1,1,2,2,4,4,6,6) # mean X per group
grpYbar <- slope*grpX + rnorm(length(grpX))*sYbar
plot(grpX,grpYbar)
cor(grpX,grpYbar)^2
# Perform 500 simulations
#
# Simulate 5 responses per group
grp <- c(1,2,3,4,5,6,7,8)
replicate <- c(1,2,3,4,5)
# Values to save in vectors
Xp <- as.numeric(NULL)
Sp <- as.numeric(NULL)
Bx <- as.numeric(NULL)
#
for (dups in 1:500) {
#
simData <- as.data.frame(list(grp=as.numeric(NULL),
replicate=as.numeric(NULL),
X=as.numeric(NULL),
Y=as.numeric(NULL)))
for (i in grp) {
for (j in replicate) {
g <- grp[i]
r <- replicate[j]
X <- grpX[i] + rnorm(1)
Y <- grpYbar[i] + rnorm(1)*sY
simData <- rbind(simData,list(grp=g,
replicate=r,
X=X,
Y=Y))
}
}
#
s1 <- gam(Y ~ X + s(grp, bs = 're'),
data=simData,method="REML")
#
Xp <- c(Xp,summary(s1)$p.table[2,4])
Sp <- c(Sp,summary(s1)$s.table[1,4])
Bx <- c(Bx,summary(s1)$p.table[2,1])
#
}
# Plot significances
par(mfrow=c(2,2))
hist(Xp,main = "Slope p-value")
abline(v=0.05,lty=2,col="red")
hist(Sp,main = "Random Effects p-value")
abline(v=0.05,lty=2,col="red")
plot(Xp,Sp)
hist(Bx)
# Significant slope
sum(Xp<0.05)
# significant RE
sum(Sp<0.05)
# Significant slope(row) by RE(col)
Xsig <- Xp<0.05
Ssig <- Sp<0.05
sigTableData <- as.data.frame(list(Xsig=Xsig,
Ssig=Ssig))
table(sigTableData)