Skip to main content

Questions tagged [model-selection]

Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.

695 questions with no upvoted or accepted answers
10 votes
2 answers
771 views

How to train a model when instead of a target we have a range where it is?

Often in machine learning we have a situation when target is numeric (real or integer). Each target comes with an associated input vector. The goal is to learn the mapping from the input vectors to ...
Roman's user avatar
  • 612
9 votes
0 answers
93 views

Any Insights on the adoption and use of the Healthy Akaike Information Criterion (hAIC)?

Recently, I came across the Healthy Akaike Information Criterion (hAIC), introduced by Demidenko in his 2004 book "Mixed Models: Theory and Applications with R." Despite its (potential) ...
Robert Long's user avatar
  • 64.1k
7 votes
0 answers
951 views

The extrapolation problem: model selection, performance metrics, and improvement

Machine learning models are fit to a response variable within a given range. This leads to weak and sometimes disastrous performance when it comes to instances with an actual response variable outside ...
Kinformationist's user avatar
7 votes
0 answers
291 views

Reference Request: Information Geometry for Ridge Regression

I am reading the book "regression estimators" by Gruber 2010 where he uses this technique to compare Ridge Regressors, however he concentrates on deriving the mathematical results without ...
Baz's user avatar
  • 1,763
7 votes
0 answers
187 views

Graphical nominal model

Suppose I have a set of $k$ matrices. $$ \mathbb D = A_1,A_2,...,A_k $$ Each column of $A$ is categorical vector. $$ A = v_1,v_2,...,v_n $$ I want to find the mapping $$ f: A \...
Jessica Collins's user avatar
7 votes
0 answers
818 views

How do you handle the situation where the residual variance is very high compared to the other variance parameter estimates?

Context An experiment in agronomy whose aim is to investigate the possible effect of a treatment, with 13 possible levels, on the height of trees. Model $ Y_{ijk} = \mu_{\cdot \cdot \cdot} + \...
ocram's user avatar
  • 22.2k
6 votes
1 answer
244 views

Match model selection strategies with modelling objectives

I am confused trying to match different model selection strategies with different modelling objectives. (Unfortunately, my confusion is reflected in the length of the post. Please be patient.) Model ...
Richard Hardy's user avatar
6 votes
1 answer
1k views

Model selection between parametric nonparametric methods

I have a real data set ($n=50$). I would like to fit some parametric models to this data set and then compare the maximum log-likelihood values with my spline model which is a nonparametric model. ...
shany's user avatar
  • 79
6 votes
0 answers
693 views

AIC with Mantel's tests

Mantel's tests are commonly used to compare genetic distances (say, between a number of individuals) with true or hypothesized landscape distances between those same individuals. For example, “does ...
Robert Long's user avatar
5 votes
0 answers
53 views

How does Lindley compare a Bayes factor and a p-value?

I was reading this paper by Dennis Lindley ("Analysis of a Wine Tasting", J. Wine Econ. 2006). Statistically, the paper is a straightforward analysis of a $10\times 11$ two-way table. To test whether ...
Robin Ryder's user avatar
  • 2,096
5 votes
0 answers
3k views

Maximum lag length when working with daily time series data

When working with (financial) time series data in R, one may use a Vector autoregressive model (VAR). One important issue when working with VARs is determining their lag length. In R, the command <...
Kuma's user avatar
  • 467
5 votes
0 answers
522 views

How to compare multivariate forecasting methods?

Let $X$ be a multivariate time series of $N$ variables and $T$ observations. Let split $X$ into two separate datasets : $X_{train}$ : a train set with $N$ variables and $T_{train}$ observations $X_{...
Jaewon's user avatar
  • 51
5 votes
0 answers
316 views

When using lmer is a random intercept being estimated more than once if specified in seperate grouping factors?

I know there are a slew of lmer specification questions already floating around. Please let me know if this is a duplicate, or if it is deemed off-topic, and I'll delete it. I am using a forward ...
russellpierce's user avatar
4 votes
0 answers
27 views

Why do model selection criteria (xICs, etc) not explicitly incorporate a loss function?

Model Selection and Multimodel Inference by Burnham and Anderson notes that TIC, AIC, AICc and QAICc are based on K-L distance between a given model and true model. Also BIC is in a sense based on ...
Mohan's user avatar
  • 939
4 votes
0 answers
33 views

Can cross-validation be involved in model-building rather than validation?

I have a general idea in mind that would go like this: randomly split the data into training/testing build a model on the training data by choosing from among candidate predictors evaluate it on the ...
Dave's user avatar
  • 2,651

15 30 50 per page
1
2 3 4 5
47