2
$\begingroup$

Say that I used a nested cross validation to do SVM classification on an fMRI dataset with hyperparameter tuning ( using a linear or rbf kernel). The classification on my outer cross validation folds are good and consistent, and the selected models in all inner cross validations are relatively similar, so the model selection was fairly stable across all folds.

Now I want to run a permutation test on the classification to see whether the overall classification accuracy is significantly greater than chance.

My question is this: is each newly-permuted dataset supposed to go through the same nested cross validation procedure as the true dataset? If yes, I would imagine that this means the hyperparameters selected in the inner loops are all likely to differ on every permuted dataset (and different from the true dataset). The end result would be a null distribution made up of models with different hyperparameters. This strikes me as odd and possibly incorrect.

Alternatively, is each newly-shuffled dataset supposed to undergo a 1-layer cross validation in which every fold uses the same hyperpermaters selected during the true dataset analysis? This seems more natural to me, but I can't stop thinking that my null distribution might be biased since I'm using hyperparameters specifically tuned to the true dataset.

$\endgroup$

2 Answers 2

2
$\begingroup$

As I see the training comprising auto-tuning of hyperparameters, i.e. my training function is train_tuned (training_data) rather than train (training_data, hyperparameters),

  • cross validation is done to evaluate the models produced by train_tuned, and also
  • the permutation test should evaluate train_tuned.

  • If the permutation test results in a wide variation of hyperparameters in the tuned models, that's IMHO fine and not different from the permutation test fake models having a wide variation of coefficients/"normal" parameters (or, in your case, unstable support vectors).
    Consider the permutation test of a linear model: we'd expect the coefficients to have a random variation around zero.

  • If train_tuned returns basically a failure (i.e. your already see that the internal optimization doesn't find a stable solution), that would be the best possible outcome!

  • You may want to record and shown the distribution of hyperparameters together with the hyperparameters you got for your real data.

  • uses the same hyperpermaters selected during the true dataset analysis? This seems more natural to me, but I can't stop thinking that my null distribution might be biased since I'm using hyperparameters specifically tuned to the true dataset.

    I'd also be concerned about using the hyperparameters tuned to the real data. OTOH, I don't think it hurts to do a permutation test on train (permuted_train_data, fixed_hyperparameters) in addition to the permutation test on train_tuned (permuted_train_data): it will be more information about the behaviour of your models. Just make sure you are very clear about what exactly you do and why.

$\endgroup$
0
$\begingroup$

It depends on your aim.

But in most cases, one should include the hyperparameter tuning to provide an honest representation of the null distribution. Basically, a permutation test should answer: If I repeated the entire study using a dataset with no association between the variables and the outcome, what would my performance be? And then we can compare our observed result with the permutation (ideally thousands of permutations) to see if it is better.

During the hyperparameter tuning we always select the best model, and because of this "winner's bias" the permutation becomes better than random - especially in underpowered studies. As we don't know how much better than random, we must include the hyperparameter tuning in the permutation.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.