Questions tagged [model-selection]

Ask Question

Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.

2,003 questions

819 votes

10 answers

1.1m views

How to choose the number of hidden layers and nodes in a feedforward neural network?

Is there a standard and accepted method for selecting the number of layers, and the number of nodes in each layer, in a feed-forward neural network? I'm interested in automated ways of building neural ...

Rob Hyndman

57.5k

asked Jul 20, 2010 at 0:15

294 votes

8 answers

220k views

How to choose a predictive model after k-fold cross-validation?

I am wondering how to choose a predictive model after doing K-fold cross-validation. This may be awkwardly phrased, so let me explain in more detail: whenever I run K-fold cross-validation, I use K ...

Berk U.

5,075

asked Mar 15, 2013 at 2:21

292 votes

3 answers

34k views

How to know that your machine learning problem is hopeless?

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions ...

Tim

140k

asked Jul 5, 2016 at 8:22

291 votes

13 answers

258k views

Is there any reason to prefer the AIC or BIC over the other?

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a ...

russellpierce

18.9k

asked Jul 23, 2010 at 20:49

255 votes

8 answers

128k views

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem ...

S4M

2,716

asked Jan 9, 2012 at 18:22

201 votes

6 answers

74k views

Training on the full dataset after cross-validation?

TL:DR: Is it ever a good idea to train an ML model on all the data available before shipping it to production? Put another way, is it ever ok to train on all data available and not check if the model ...

Amelio Vazquez-Reina

19.5k

asked Jun 5, 2011 at 16:50

138 votes

4 answers

72k views

Nested cross validation for model selection

How can one use nested cross validation for model selection? From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold ...

Amelio Vazquez-Reina

19.5k

asked Jul 22, 2013 at 15:53

100 votes

2 answers

14k views

How much do we know about p-hacking "in the wild"?

The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ...

Silverfish

23.8k

asked Mar 9, 2016 at 13:14

90 votes

5 answers

26k views

What are modern, easily used alternatives to stepwise regression?

I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable. I am aware that ...

fmark

4,987

asked Jul 31, 2011 at 23:45

88 votes

14 answers

7k views

Why haven't robust (and resistant) statistics replaced classical techniques?

When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so ...

doug

10.6k

asked Aug 3, 2010 at 7:49

85 votes

6 answers

13k views

Variable selection for predictive modeling really needed in 2016?

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [...

horaceT

3,372

asked May 28, 2016 at 20:13

77 votes

4 answers

58k views

Linear model with log-transformed response vs. generalized linear model with log link

In this paper titled "CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA" the authors write: In a generalized linear model, the mean is transformed, by the link function, instead of ...

miura

3,734

asked Jan 16, 2013 at 10:01

67 votes

2 answers

36k views

Why only three partitions? (training, validation, test)

When you are trying to fit models to a large dataset, the common advice is to partition the data into three parts: the training, validation, and test dataset. This is because the models usually have ...

charles.y.zheng

7,986

asked Apr 8, 2011 at 14:45

65 votes

2 answers

5k views

A more definitive discussion of variable selection

Background I'm doing clinical research in medicine and have taken several statistics courses. I've never published a paper using linear/logistic regression and would like to do variable selection ...

sharper_image

asked Jul 14, 2016 at 16:30

59 votes

1 answer

28k views

What are posterior predictive checks and what makes them useful?

I understand what the posterior predictive distribution is, and I have been reading about posterior predictive checks, although it isn't clear to me what it does yet. What exactly is the posterior ...

Amelio Vazquez-Reina

19.5k

asked Sep 11, 2014 at 18:37

15 30 50 per page

2 3 4 5

…

134 Next

Stack Exchange Network

Questions tagged [model-selection]

How to choose the number of hidden layers and nodes in a feedforward neural network?

How to choose a predictive model after k-fold cross-validation?

How to know that your machine learning problem is hopeless?

Is there any reason to prefer the AIC or BIC over the other?

Algorithms for automatic model selection

Training on the full dataset after cross-validation?

Nested cross validation for model selection

How much do we know about p-hacking "in the wild"?

What are modern, easily used alternatives to stepwise regression?

Why haven't robust (and resistant) statistics replaced classical techniques?

Variable selection for predictive modeling really needed in 2016?

Linear model with log-transformed response vs. generalized linear model with log link

Why only three partitions? (training, validation, test)

A more definitive discussion of variable selection

What are posterior predictive checks and what makes them useful?

Hot Network Questions

Questions tagged [model-selection]

Related Tags