Skip to main content

All Questions

1 vote
1 answer
49 views

Estimate number of covariates in Cox regression model

My doubt about overfitting is almost general, but in this particular case is all about survival models. I am working in a case-cohort study, estimating the HR in a cohort where heart attack correspond ...
Javier Hernando's user avatar
1 vote
0 answers
90 views

Model calibration in overfitted models

Why in Shrinkage, due to an overfitted prediction model, do we tend to overestimate risk for "high risk" subjects and to underestimate risk for "low risk" subjects ? Intuitively I ...
vixxovs's user avatar
  • 45
4 votes
2 answers
195 views

When does model selection begin to overfit?

Suppose you have a small dataset (perhaps 1000 labels), and you are using cross-validation to train different models and to choose the best one (according to their cross-validation scores). It seems ...
MWB's user avatar
  • 1,337
0 votes
2 answers
98 views

How to estimate the probability of LOOCV error of one model to be better then LOOCV error of the correct model?

Lets consider a simple regression problem in which we have only one real-valued feature and one real valued-target. We try to fit the data using a polynomial function. We also try to use the given ...
Roman's user avatar
  • 612
2 votes
1 answer
34 views

Model selection in presence of overfitting - better test or closer train

Suppose I have a tree-based model (Random Forest for the sake of the example) and I play with a regularization parameter (tree depth) to fight overfitting. Eventually I can come up with two models - ...
Dimgold's user avatar
  • 318
0 votes
1 answer
193 views

Which if these two models works better?

I have this time series I want to perform polynomial regression on, to estimate the trend. To start, I tried using only a second order polynomial, these are the results (AIC=30.37105) We can see how ...
Marco Rudelli's user avatar
3 votes
1 answer
1k views

Selecting p,q,d for ARIMA and overfitting. Shouldn't the parameters be tuned on a training set?

I have seen multiple tutorials [example link] for ARIMA where they select the p,q,d parameters for it based on the whole time series. Then, after deciding on the model parameters they want to use, ...
MattSt's user avatar
  • 350
20 votes
4 answers
4k views

Why does the Akaike Information Criterion (AIC) sometimes favor an overfitted model?

As an exercise to develop practical experience working with model selection criteria, I computed fits of the highway mpg vs. engine displacement data from the tidyverse mpg example data set using ...
stachyra's user avatar
  • 3,102
3 votes
2 answers
345 views

What is accepted practice for avoiding optimistic bias when selecting a model family after hyperparameter tuning?

This is an extension of a previous question: How to avoid overfitting bias when both hyperparameter tuning and model selecting? ...which provided some options for the question at hand, but now I would ...
Josh's user avatar
  • 308
10 votes
2 answers
4k views

How to avoid overfitting bias when both hyperparameter tuning and model selecting?

Say I have 4 or more algorithm types (logistic, random forest, neural net, svm, etc) each of which I want to try out on my dataset, and each of which I need to tune hyperparameters on. I would ...
Josh's user avatar
  • 308
5 votes
3 answers
722 views

How to choose between an overfit model and a non-overfit model?

I often encounter this situation in modeling. Suppose I build two classification models. Below is their performance: Model 1: training accuracy: 0.80, test accuracy: 0.50 Model 2: training accuracy: 0....
etang's user avatar
  • 1,007
1 vote
0 answers
279 views

How can one use Grid Search without overfitting the model?

I checked several questions, like Overfitting during model selection - AutoML vs Grid search and Hyperparameter tuning using grid search/randomised search, but I don't think any of them answer my ...
dmmmmd's user avatar
  • 23
2 votes
1 answer
247 views

Can I still use an overfitted model with high test accuracy?

Below is the training statistics output from training a Keras/TF model. You can see val_accuracy peaks at Epoch 4 with 0.6633. After that accuracy(train) continues to go up but val_accuracy becomes ...
etang's user avatar
  • 1,007
1 vote
2 answers
260 views

When is it okay to make changes to your model after validating?

Let’s say I’m building a model to predict cancer relapse for a scientific paper. I use my training set to build many models and validate the best one on my test set to get an AUC of 0.65. I then go ...
Daniel Freeman's user avatar
3 votes
1 answer
95 views

Overfitting through model selection

I'm asking this question as I found little explanation of this phenomenon otherwhere. I am wondering about how to best deal with overfitting that comes from the model selection itself. Say I want to ...
Adrian Constantin Penz's user avatar

15 30 50 per page