I am having a strange problem. I have successfully ran this code on my laptop, but when I try to run it on another machine first I get this warning
Distribution not specified, assuming bernoulli ..., which I expect but then I get this error:
Error in object$var.levels[[i]] : subscript out of bounds
library(gbm)
gbm.tmp <- gbm(subxy$presence ~ btyme + stsmi + styma + bathy,
data=subxy,
var.monotone=rep(0, length= 4), n.trees=2000, interaction.depth=3,
n.minobsinnode=10, shrinkage=0.01, bag.fraction=0.5, train.fraction=1,
verbose=F, cv.folds=10)
Can anybody help? The data structures are exactly the same, same code, same R. I am not even using a subscript here.
EDIT: traceback()
6: predict.gbm(model, newdata = my.data, n.trees = best.iter.cv)
5: predict(model, newdata = my.data, n.trees = best.iter.cv)
4: predict(model, newdata = my.data, n.trees = best.iter.cv)
3: gbmCrossValPredictions(cv.models, cv.folds, cv.group, best.iter.cv,
distribution, data[i.train, ], y)
2: gbmCrossVal(cv.folds, nTrain, n.cores, class.stratify.cv, data,
x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth,
n.minobsinnode, shrinkage, bag.fraction, var.names, response.name,
group)
1: gbm(subxy$presence ~ btyme + stsmi + styma + bathy, data = subxy,var.monotone = rep(0, length = 4), n.trees = 2000, interaction.depth = 3, n.minobsinnode = 10, shrinkage = 0.01, bag.fraction = 0.5, train.fraction = 1, verbose = F, cv.folds = 10)
Could it have something to do because I moved the saved R workspace to another machine?
EDIT 2: ok so I have updated the gbm package on the machine where the code was working and now I get the same error. So at this point I am thinking that the older gbm package did perhaps not have this check in place or that the newer version has some problem. I don't understand gbm well enough to say.
$
; just dopresence ~ ...
. (2) One thing to check is that both machines have R set up the same way; for instance checkstringsAsFactors
.subxy
data frame? If it's your own data, then please can you provide some sample data that reproduces the problem. Atraceback()
of where the error occurs would also be useful.gbm
is "bernoulli", so if you have an outcome with greater than two levels, wouldn't you expect to throw an error?