What is the order when doing feature engineering? (imputation, encoding, etc.)

Question

I am self learning machine learning right now, and I am confused with what should I do first.

Should I impute the missing value before encoding the categorical variable?
Also, I am learning from Kaggle, and it always split to train, test set before doing any feature engineering stuff. What is the reason behind it? Can I doing it for the entire dataset?
When should I perform cross validation? Before splitting the data?

I also hope to know the reason behind all the decision because I don’t want to just memorize it. It was difficult to learn by myself for this extremely complex topic.

Similar Qs with As: stats.stackexchange.com/questions/499228/…, stats.stackexchange.com/questions/95083/…, stats.stackexchange.com/questions/440372/…, — kjetil b halvorsen, Commented Jul 20, 2021 at 1:41
Note that data splitting is typically a bad idea unless n > 20,000. — Frank Harrell, Commented Jul 31, 2022 at 12:13
@FrankHarrell Do you mean that one should not split the dataset into train and test set before doing any feature engineering, unless n > 20000? If so, why? — Ganesh Tata, Commented Jul 31, 2022 at 18:34
I meant that data splitting is an enormously wasteful statistical procedure, giving unstable results unless the true signal:noise ratio is very high (outcomes are easy to predict) or n > 20,000. Details here. What is your sample size and distribution of Y? Most often resampling (100 repeats of 10-fold CV or 400 bootstrap reps) is more efficient than data splitting and also exposes the silliness of feature selection. — Frank Harrell, Commented Jul 31, 2022 at 20:17

Kolawole · Accepted Answer · 2021-07-30 17:13:52Z

0

Most times imputing missing values are for numeric features and has nothing to do with encoding which is for categorical data. So, deal with missing value before encoding will seem like a good choice.

answered Jul 30, 2021 at 17:13

Kolawole

1

Add a comment |

Stack Exchange Network

What is the order when doing feature engineering? (imputation, encoding, etc.)

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
machine-learning
feature-engineering
or ask your own question.

Linked

Hot Network Questions

What is the order when doing feature engineering? (imputation, encoding, etc.)

1 Answer 1

Not the answer you're looking for? Browse other questions tagged machine-learningfeature-engineering or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
feature-engineering
or ask your own question.