Questions tagged [model-selection]
Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.
695
questions with no upvoted or accepted answers
10
votes
2
answers
771
views
How to train a model when instead of a target we have a range where it is?
Often in machine learning we have a situation when target is numeric (real or integer). Each target comes with an associated input vector. The goal is to learn the mapping from the input vectors to ...
9
votes
0
answers
93
views
Any Insights on the adoption and use of the Healthy Akaike Information Criterion (hAIC)?
Recently, I came across the Healthy Akaike Information Criterion (hAIC), introduced by Demidenko in his 2004 book "Mixed Models: Theory and Applications with R." Despite its (potential) ...
7
votes
0
answers
951
views
The extrapolation problem: model selection, performance metrics, and improvement
Machine learning models are fit to a response variable within a given range. This leads to weak and sometimes disastrous performance when it comes to instances with an actual response variable outside ...
7
votes
0
answers
291
views
Reference Request: Information Geometry for Ridge Regression
I am reading the book "regression estimators" by Gruber 2010 where he uses this technique to compare Ridge Regressors, however he concentrates on deriving the mathematical results without ...
7
votes
0
answers
187
views
Graphical nominal model
Suppose I have a set of $k$ matrices.
$$
\mathbb D = A_1,A_2,...,A_k
$$
Each column of $A$ is categorical vector.
$$
A = v_1,v_2,...,v_n
$$
I want to find the mapping
$$
f: A \...
7
votes
0
answers
818
views
How do you handle the situation where the residual variance is very high compared to the other variance parameter estimates?
Context
An experiment in agronomy whose aim is to investigate the possible effect of a treatment, with 13 possible levels, on the height of trees.
Model
$
Y_{ijk} = \mu_{\cdot \cdot \cdot} + \...
6
votes
1
answer
244
views
Match model selection strategies with modelling objectives
I am confused trying to match different model selection strategies with different modelling objectives. (Unfortunately, my confusion is reflected in the length of the post. Please be patient.)
Model ...
6
votes
1
answer
1k
views
Model selection between parametric nonparametric methods
I have a real data set ($n=50$). I would like to fit some parametric models to this data set and then compare the maximum log-likelihood values with my spline model which is a nonparametric model. ...
6
votes
0
answers
693
views
AIC with Mantel's tests
Mantel's tests are commonly used to compare genetic distances (say, between a number of individuals) with true or hypothesized landscape distances between those same individuals. For example, “does ...
5
votes
0
answers
53
views
How does Lindley compare a Bayes factor and a p-value?
I was reading this paper by Dennis Lindley ("Analysis of a Wine Tasting", J. Wine Econ. 2006). Statistically, the paper is a straightforward analysis of a $10\times 11$ two-way table. To test whether ...
5
votes
0
answers
3k
views
Maximum lag length when working with daily time series data
When working with (financial) time series data in R, one may use a Vector autoregressive model (VAR). One important issue when working with VARs is determining their lag length.
In R, the command <...
5
votes
0
answers
522
views
How to compare multivariate forecasting methods?
Let $X$ be a multivariate time series of $N$ variables and $T$ observations.
Let split $X$ into two separate datasets :
$X_{train}$ : a train set with $N$ variables and $T_{train}$ observations
$X_{...
5
votes
0
answers
316
views
When using lmer is a random intercept being estimated more than once if specified in seperate grouping factors?
I know there are a slew of lmer specification questions already floating around. Please let me know if this is a duplicate, or if it is deemed off-topic, and I'll delete it.
I am using a forward ...
4
votes
0
answers
27
views
Why do model selection criteria (xICs, etc) not explicitly incorporate a loss function?
Model Selection and Multimodel Inference by Burnham and Anderson notes that TIC, AIC, AICc and QAICc are based on K-L distance between a given model and true model. Also BIC is in a sense based on ...
4
votes
0
answers
33
views
Can cross-validation be involved in model-building rather than validation?
I have a general idea in mind that would go like this:
randomly split the data into training/testing
build a model on the training data by choosing from among candidate predictors
evaluate it on the ...