Skip to main content

Questions tagged [model-selection]

Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.

255 votes
8 answers
128k views

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem ...
S4M's user avatar
  • 2,716
292 votes
3 answers
34k views

How to know that your machine learning problem is hopeless?

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions ...
Tim's user avatar
  • 140k
138 votes
4 answers
72k views

Nested cross validation for model selection

How can one use nested cross validation for model selection? From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold ...
Amelio Vazquez-Reina's user avatar
201 votes
6 answers
74k views

Training on the full dataset after cross-validation?

TL:DR: Is it ever a good idea to train an ML model on all the data available before shipping it to production? Put another way, is it ever ok to train on all data available and not check if the model ...
Amelio Vazquez-Reina's user avatar
291 votes
13 answers
258k views

Is there any reason to prefer the AIC or BIC over the other?

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a ...
russellpierce's user avatar
20 votes
2 answers
9k views

What are chunk tests?

In answer to a question on model selection in the presence of multicollinearity, Frank Harrell suggested: Put all variables in the model but do not test for the effect of one variable adjusted for ...
fmark's user avatar
  • 4,987
294 votes
8 answers
220k views

How to choose a predictive model after k-fold cross-validation?

I am wondering how to choose a predictive model after doing K-fold cross-validation. This may be awkwardly phrased, so let me explain in more detail: whenever I run K-fold cross-validation, I use K ...
Berk U.'s user avatar
  • 5,075
21 votes
5 answers
16k views

Can I ignore coefficients for non-significant levels of factors in a linear model?

After seeking clarification about linear model coefficients over here I have a follow up question concerning non-signficant (high p value) for coefficients of factor levels. Example: If my linear ...
Trees4theForest's user avatar
819 votes
10 answers
1.1m views

How to choose the number of hidden layers and nodes in a feedforward neural network?

Is there a standard and accepted method for selecting the number of layers, and the number of nodes in each layer, in a feed-forward neural network? I'm interested in automated ways of building neural ...
Rob Hyndman's user avatar
  • 57.5k
77 votes
4 answers
58k views

Linear model with log-transformed response vs. generalized linear model with log link

In this paper titled "CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA" the authors write: In a generalized linear model, the mean is transformed, by the link function, instead of ...
miura's user avatar
  • 3,734
34 votes
3 answers
24k views

Prerequisites for AIC model comparison

What are exactly the prerequisites, that need to be fulfilled for AIC model comparison to work? I just came around this question when I did comparison like this: ...
Tomas's user avatar
  • 6,211
10 votes
2 answers
750 views

Selecting ARIMA orders by ACF/PACF vs. by information criteria

We keep on getting questions here about selecting ARIMA model orders based on ACF/PACF plots. This is the older methodology proposed by Box and Jenkins. More modern tools like the ...
Stephan Kolassa's user avatar
90 votes
5 answers
26k views

What are modern, easily used alternatives to stepwise regression?

I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable. I am aware that ...
fmark's user avatar
  • 4,987
100 votes
2 answers
14k views

How much do we know about p-hacking "in the wild"?

The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ...
Silverfish's user avatar
  • 23.8k
85 votes
6 answers
13k views

Variable selection for predictive modeling really needed in 2016?

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [...
horaceT's user avatar
  • 3,372

15 30 50 per page
1
2 3 4 5
25