Skip to main content

All Questions

0 votes
0 answers
14 views

How to split data when training and tuning the meta learner in stacking?

I have a simple yet tricky conceptual question about the data splitting of a meta learning process. Assume I have a simple X_train, ...
Yann's user avatar
  • 43
0 votes
0 answers
16 views

Select the most general machine learning model

For example, let's say that model A had an average train auc of 0.82 and a test auc of 0.79 through cross-validation. The difference between the two scores is 0.03. Let's say that model B has a train ...
JAE's user avatar
  • 89
0 votes
0 answers
33 views

I screwed-up model selection but ended-up with a very good model; am I ok?

In a recent experiment, I made an oversight: I divided my data into training and testing sets and conducted cross-validation for model selection and hyperparameter tuning after having applied Boruta (...
Alek Fröhlich's user avatar
1 vote
0 answers
13 views

Model choice based on test/train/validation split [duplicate]

My question is very simple, but no matter where I look it up, it seems that I get another answer. Take a simple classification task. Let's say I trained a kNN, LDA and logistic regression on it for ...
Marlon Brando's user avatar
0 votes
0 answers
26 views

How to fit a dataset like this, and what's the recommended evaluate metrics for it

the dataset seems like non-linear, is there any recommended way to fit the datatset? since it's a non-linear regression problem, what's the correct way to evaluate the model's prediction? is the MSE ...
Wuuu's user avatar
  • 1
11 votes
7 answers
3k views

Why do we use Linear Models when tree based models often work better than linear models?

In Supervised Machine Learning, and specifically on Kaggle, it is usually seen that tree models often outperform linear models. And even in the tree-based models, it is usually XGBoost that ...
letdatado's user avatar
  • 367
1 vote
0 answers
37 views

Hyperparameter selection after nested cross-validation and making comparisons with DeLong's test

I have already read all the associated questions on the topic but couldn't find a clear answer. I initially split my data into training (80%) and hold-out testing (20%). Then, I am performing nested ...
user22409235's user avatar
1 vote
1 answer
201 views

Which regression model would you choose?

Which regression model would you choose to model the following flood damage data? The variables are x1=water height, x2=dike height and x3=flood damage. The following plot shows how the flood damages ...
Sjafnargata's user avatar
1 vote
1 answer
37 views

train / validation / test split problem

Suppose that I have created train/validation/test splits for model building. I optimized the hyperparameters using the validation set and chose the parameter values which gave the highest accuracy. To ...
Sanyo Mn's user avatar
  • 1,262
1 vote
0 answers
30 views

Best Strategy for Model Training & Selection (Spoiler: Should I Re-Train?)

After a discussion with some colleagues, I've realized we've different views on which is the go-to strategy for model training. Strategy A: Train-Validation-Test Split and Final Model Selection ...
rusiano's user avatar
  • 566
0 votes
0 answers
41 views

How to test for significance of differences between metrics for two models? (Machine learning model selection)

Problem - I want to test whether the difference in a metric (say AUC) between two models is significant. I have one vector of binary class predictions from a custom function and one from sklearn....
mckerm1t's user avatar
4 votes
2 answers
195 views

When does model selection begin to overfit?

Suppose you have a small dataset (perhaps 1000 labels), and you are using cross-validation to train different models and to choose the best one (according to their cross-validation scores). It seems ...
MWB's user avatar
  • 1,337
2 votes
0 answers
85 views

Is AIC scale invariant for problems concerning the number of data points in regression?

I am trying to use Akaike Information Criterion with the small sample correction (AICc) as method for determining how many data points to use in a linear approximation of a non-linear function; the ...
Glen Mackey's user avatar
0 votes
0 answers
53 views

ISLR Chapter 6 : Choosing the Optimal Model

I had a question regarding the "choosing the optimal model" section of chapter 6 of ISLR (pg. 232). The book states that "In order to select the best model with respect to test error, ...
The Blankest Slate's user avatar
0 votes
1 answer
383 views

How to select a model based on ROC AUC, sensitivity and specificity?

I'm running several machine learning algorithms on a dataset with 80% negatives and 20% positive cases (classification). Below I attach the results of comparing performance on 500 bootstrap resamples ...
amr95's user avatar
  • 13

15 30 50 per page
1
2 3 4 5
12