0
$\begingroup$

One way of addressing high cardinality in a column is the use of frequency encoding. However, if you use a cross validated analysis plan the you would need to re-encode the column at each step.

It's been suggested that step_lencode_mixed() from r-library "embed" (version 1.1.2) could be used. The examples that I have seen look like this:

classify.knn <- train(target ~ ., data = data.trn, method = "knn", 
                trControl = ctrl,  
                preProcess = c("center","scale"), 
                tuneGrid =data.frame(k=seq(5,100,by=15))) %>%
                step_lencode_mixed(watchlist, outcome = vars(target))

My concern is that step_lencode_mixed() ("watchlist" is the column I wish to frequency encode) is invoked after the train statement.

Is this correct?

$\endgroup$

0

Browse other questions tagged or ask your own question.