5
$\begingroup$

I'm using mice in R to impute missing values. If I understand correctly, mice specifies a fully conditional model to draw new values from some posterior distribution to fill the gaps.

Since my data are split into a train and test set, I don't think I can just impute the entire data set, as this would leak information from the test set. However, it seems wasteful to start the entire imputation procedure all over again, especially since the test set is smaller.

Is there a way to re-use the learned model on the test set?

$\endgroup$
2
  • $\begingroup$ I think you should use the imputation model of the training dataset to impute the test. $\endgroup$ Commented Mar 8, 2018 at 8:01
  • 1
    $\begingroup$ Thank you for your comment, but my question is how to do that. $\endgroup$ Commented Mar 8, 2018 at 8:07

1 Answer 1

1
$\begingroup$

I believe you should impute on your training data and then store that model to be used to predict on your test set. The problem is you cannot do this with the mice package. You may want to look at ?caret::preProcess and ?recipes::step_knnimpute or ?recipes::step_bagimpute. Both these packages can do what you want but have fewer features than mice.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.