I have read this blog post, which states that either 5x2-fold, 10x10-fold or McNemar's test should be used for comparing two models on statistical significance, and does not suggest using nonparametric paired test (because k-fold cross-validation would create dependent samples and violate iid assumption): https://machinelearningmastery.com/statistical-significance-tests-for-comparing-machine-learning-algorithms/
The problem is I already have results from a 5-fold cross validation and I need to get the statistical significance values for them. Re-running the validation would take a significant amount of time, because the hyperparameters are optimised through Bayesian optimisation over several iterations. That's why, I need a test that can be applied at this stage for comparing the models, instead of rerunning through 5x2 or 10x10. The models are multi-class classifiers and I need to compare the loss of the models (a parameter I defined) for each fold (float value, not binary, hence I cannot do McNemar). Is there any test that is valid on this case? Please refer to papers to support your suggestions (this is necessary for an academic paper).