Skip to main content

Questions tagged [cross-validation]

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations. Methods for cross-validation usually involve withholding a random subset of the data during model fitting and quantifying how accurate the withheld data are predicted and repeating this process to get a measure of prediction accuracy.

0 votes
1 answer
20 views

Sklearn EstimatorCV vs GridSearchCV

sklearn has the following description for EstimatorCV estimators: https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator An estimator that has built-in cross-validation ...
wannabedatascientist's user avatar
0 votes
0 answers
23 views

How to choose thresholds to discretize target for binary classification

My group is using logistic regression to investigate the most predictive features in a dataset. Our target variable is actually a continuous variable that we discretized using two cutoff thresholds (...
OstensiblyPutative's user avatar
0 votes
0 answers
15 views

How to Combine Cross-Validation Error and Ensemble Prediction Variance in Machine Learning?

I am working on a machine learning project where I use an ensemble model (Random Forest) and I want to accurately represent the prediction uncertainty. Specifically, I want to combine the cross-...
x H's user avatar
  • 1
0 votes
1 answer
15 views

Averaging model performance across n-fold cross validation: MSE or R^2?

I'm comparing the performance of several models on the same data using cross-validation (holding out 1/n of the data as a test set, fitting the model on the remaining data, testing on the test set). I ...
Leo Selker's user avatar
2 votes
1 answer
29 views

Does it make sense that the performance of XG Boost varies dramatically from two machines holding all hyperparameters fixed?

I am hyperparameter tuning an xgboost model and I am finding that depending if I train the model locally on my machine vs on AWS sagemaker, I get quite different results. Running cross-validation ...
Luca Guarro's user avatar
0 votes
1 answer
82 views

Test Error is extremely higher than Training error after gridsearch and crossvalidation

I'm currently working on a machine learning project. It's a supervised learning problem. My goal is to predict for given data of an animal(keeping,size,weight,...) ingredients(energy,vitamine etc..). ...
Marco Cotrotzo's user avatar
1 vote
1 answer
19 views

Scoring function in cross-validation often left default

I'm a PhD student applying ML in microbiology. In research papers, the usual performance measure reported on classification models is ROC-AUC. But when I look at implementations, the scoring function ...
alepfu's user avatar
  • 51
0 votes
1 answer
34 views

How do I identify overffiting when using GridSchearCV?

For context, I'm using Scikit Learn's GridSearchCV to find the best Hyperparameters of a Decision Tree. I believe I understand Train, Validation, and Test sets and overfitting concepts when applied ...
Lisana Daniel's user avatar
0 votes
0 answers
18 views

How to use cross validation to select/evaluate model with probability score as the output?

Initially I was evaluating my models using cross_val with out-of-pocket metrics such as precision, recall, f1 score, etc, or with my own metrics defined in ...
szheng's user avatar
  • 21
0 votes
0 answers
14 views

Right Cross Validation Implementation (Regression)

I am very new to machine learning and i am starting to work my way up. I have made an implementation for cross validation which will be used with ensemble models later. I have made a pipeline in ...
Guhan's user avatar
  • 1
1 vote
1 answer
14 views

Model evaluation approach allowing manual experimentation without data leakage

In supervised machine learning, are there any evaluation approaches beside using a fixed holdout test dataset, which allow me as a scientist to manually compare preprocessing approaches, without ...
thomas8wp's user avatar
  • 111
0 votes
1 answer
21 views

Cross validation

I do not get why in For cross validation should I use training set, or whole dataset? the responses say that cross validation must be done exclusively on training set. Doesn't the methods (for example ...
Curious student's user avatar
1 vote
0 answers
7 views

Is GroupKFold needed if some samples have some of their feature values equal?

I am given a dataset $D$ of 10k enzyme-substrate complexes having a lock-key relationship, with each sample (complex) being characterized by enzyme features $x_e$ and substrate features $x_s$. That is,...
ado sar's user avatar
  • 191
0 votes
0 answers
13 views

How does hyperparameter tuning work for constructing/choosing a final model using Nested Cross validation?

I want to determine if XGBoost is better than random forest or logistic regression for building a binary classification model. The model will be a composite model, with a feature selection model to ...
reuben george's user avatar
0 votes
0 answers
23 views

If I do cross validation do I need to refit the model?

I am making a dual process. I have an initial dataset in which I train (fit) a model, then I do cross validation to get results. Until now everything normal, but additional to that, I create a new ...
Curious student's user avatar

15 30 50 per page
1
2 3 4 5
43