Does random forest need input variables to be scaled or centered?

Question

My input variables have different dimensions. Some variables are decimal while some are hundreds. Is it essential to center (subtract mean) or scale (divide by standard deviation) these input variables in order to make the data dimensionless when using random forest?

Firebug · Accepted Answer · 2017-07-18 11:54:35Z

No.

Random Forests are based on tree partitioning algorithms.

As such, there's no analogue to a coefficient one obtain in general regression strategies, which would depend on the units of the independent variables. Instead, one obtain a collection of partition rules, basically a decision given a threshold, and this shouldn't change with scaling. In other words, the trees only see ranks in the features.

Basically, any monotonic transformation of your data shouldn't change the forest at all (in the most common implementations).

Also, decision trees are usually robust to numerical instabilities that sometimes impair convergence and precision in other algorithms.

JWB1987 · Accepted Answer · 2019-07-03 20:56:05Z

2

Overall I agree with Firebug, but there could be some value in standardizing your variables if you're interested in predictor importance scores. RF will tend to favour highly variable continuous predictors because there are more opportunities to partition the data. A better way to deal with this issue, however, is to use particular approaches (ie sampling without replacement using conditional forests) that are more robust to this bias. See https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25

answered Jul 3, 2019 at 20:56

JWB1987

211 bronze badge

4

$\begingroup$ Welcome to the site. We are trying to build a permanent repository of high-quality statistical information in the form of questions & answers. Thus, we're wary of link-only answers, due to linkrot. Can you post a full citation & a summary of the information at the link, in case it goes dead? $\endgroup$
– gung - Reinstate Monica
Commented Jul 3, 2019 at 20:58

Add a comment |

Stack Exchange Network

Does random forest need input variables to be scaled or centered?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
random-forest
standardization
data-preprocessing
centering
or ask your own question.

Linked

Hot Network Questions

Does random forest need input variables to be scaled or centered?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged random-foreststandardizationdata-preprocessingcentering or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
random-forest
standardization
data-preprocessing
centering
or ask your own question.