Imputation using MICE: Use the train data to impute the missing test data

Question

I'm using mice in R to impute missing values. If I understand correctly, mice specifies a fully conditional model to draw new values from some posterior distribution to fill the gaps.

Since my data are split into a train and test set, I don't think I can just impute the entire data set, as this would leak information from the test set. However, it seems wasteful to start the entire imputation procedure all over again, especially since the test set is smaller.

Is there a way to re-use the learned model on the test set?

I think you should use the imputation model of the training dataset to impute the test. — Chamberlain Mbah, Commented Mar 8, 2018 at 8:01
Thank you for your comment, but my question is how to do that. — Frans Rodenburg, Commented Mar 8, 2018 at 8:07

SemiQuant · Accepted Answer · 2019-02-19 15:29:45Z

1

I believe you should impute on your training data and then store that model to be used to predict on your test set. The problem is you cannot do this with the mice package. You may want to look at ?caret::preProcess and ?recipes::step_knnimpute or ?recipes::step_bagimpute. Both these packages can do what you want but have fewer features than mice.

answered Feb 19, 2019 at 15:29

SemiQuant

213 bronze badges

Add a comment |

Stack Exchange Network

Imputation using MICE: Use the train data to impute the missing test data

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
prediction
multiple-imputation
mice
or ask your own question.

Hot Network Questions

Imputation using MICE: Use the train data to impute the missing test data

1 Answer 1

Not the answer you're looking for? Browse other questions tagged predictionmultiple-imputationmice or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
prediction
multiple-imputation
mice
or ask your own question.