Questions tagged [overfitting]
Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.
987
questions
0
votes
0
answers
16
views
Augmenting data for LSTM
The problem:
I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations.
I have attempted to fit an LSTM onto the data, but it seems to ...
1
vote
0
answers
89
views
AUC > 0.5 under null model following feature selection
I've been going over the output of a Monte Carlo model that simulates disease risk as a function of genotype. Under a null model of no disease risk, we have 1000 case and 1000 control individuals. ...
0
votes
1
answer
37
views
Manual selection of parameters and features and bad results by gridsearch
For a very small dataset that I have, when I set the parameters with the help of gridsearch, the test and training results are not acceptable at all and have a huge difference. I have to manually ...
1
vote
0
answers
32
views
What to do when you realize you've overfit?
This is hypothetical and I would like to hear what people do when the get to the test set and realize they've overfit. Of course, preventing overfitting in the first place is ideal.
You're working on ...
1
vote
0
answers
30
views
Significant performance drop between train and validation set
I am trying both Lgbm and RandomForest for a classification, and I observe the same problem. I am using various metaparams to prevent overfitting, such as max_depth, num_trees (keeping it small for ...
0
votes
0
answers
25
views
Avoiding Information Leakage in Backtesting with CPCV-Tuned Hyperparameters
I'm using Combinatorial Purged Cross-Validation to tune hyperparameters for a binary classification model applied in a month-end trading strategy. I have 6 months of data and used CPCV with 15 splits ...
0
votes
0
answers
47
views
Path analysis with perfect fit
I'm trying to determine if I can display two regression models and the covariance between the dependent variables in one unified model using path analysis with lavaan in R. In the following (scaled) ...
2
votes
0
answers
33
views
Regression with small sample size - LASSO or remove variables?
I'm trying to run a regression, but I only have 14 observations, each being a different city in the US. My dependent variable is the total number of trips per capita, and my explanatory variables are ...
11
votes
1
answer
3k
views
Getting 99-100% accuracy on my training/validation data but performs bad on completely new data
I have a large dataset of the ASL (American Sign Language). I split this data into 70:15:15 for train, validation, test.
I then trained a CNN model on it, where I trained using the 70%, and evaluated ...
1
vote
1
answer
49
views
Estimate number of covariates in Cox regression model
My doubt about overfitting is almost general, but in this particular case is all about survival models. I am working in a case-cohort study, estimating the HR in a cohort where heart attack correspond ...
1
vote
0
answers
7
views
Image classification metrics
I have been working on an image classification task using CNNs and getting some puzzling results.
My training, validation and test loss keep going down with epochs and are comparable. So this might ...
0
votes
1
answer
22
views
Does the intuitive sense of overfitting in this mechanism design context exemplify bias-variance tradeoff?
Suppose the (we can say unanimous) preference of each individual in a society is to select roads for travel by placing 95% weight on the objective of minimizing travel time, and the remaining 5% ...
1
vote
1
answer
35
views
Accuracy "overfits" but loss doesn't?
I'm perplexed as to why my loss doesn't go up when the accuracy goes down (after about 40 epochs). Isn't it possible to tell overfitting from the loss curve alone? (I'm of course referring the ...
0
votes
1
answer
70
views
Is my model overfitting or is my training process wrong?
I'm predicting multiclass probabilities using CatBoost Classifier.
I have a balanced dataset with roughly 4000 rows, 13 features, 4 target class labels. Dataset has some outliers which I decided not ...
0
votes
1
answer
48
views
Learning Curve to Know Underfitting or Overfitting
I want to know if the model I am using tends to be overfitting or underfitting. I am using SVM and Random Forest algorithms. How to figure it out?