Can I use simulated data only for testing a Random Forest regression already trained on real data?

Question

I am working, using Python, on a Random Forest Regression for the prediction of a target variable. I have trained it and tested it on real data, obtaining satisfying results. Now, I would like to explore different possible scenarios to understand how, by changing the other variables, the target one would be modified. Can I test the RF model on synthetic data if I trained it on real data?

I have attempted to compute this simulated data by multiplying some variables of the real test dataset by chosen (by me) indexes. For example, by increasing variables A and C by 10%.

Is this approach of mixing real data for training and simulated data for testing acceptable?

This sounds like a great idea: to get a sense of how the model would behave under various alternative scenarios, since future data might not look like past data. I think we could describe this as a "sensitivity analysis". — John Madden, Commented Jul 4 at 14:14
Thank you so much, I did not want to use an approach that did not statistically make sense! — Ismaela Avellino, Commented Jul 8 at 9:23

mkt · Accepted Answer · 2024-07-17 09:42:21Z

This is generally good practice for understanding many machine learning models, and some complex statistical models. It's especially valuable as a way to evaluate how the model extrapolates. In the case of standard random forests, though, extrapolation is very crude and this approach probably won't tell you much (see Decision Trees and Regression - Can predicted values be outside range of training data? ). However, if you are not extrapolating, it's still a good way to examine how predictors jointly affect the response - essentially, this is a way of recreating a partial dependence plot in multiple dimensions.

Stack Exchange Network

Can I use simulated data only for testing a Random Forest regression already trained on real data?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
random-forest
simulation
or ask your own question.

Linked

Hot Network Questions

Can I use simulated data only for testing a Random Forest regression already trained on real data?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonrandom-forestsimulation or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
random-forest
simulation
or ask your own question.