Skip to main content
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
edited title
Link
Richard Hardy
  • 68.6k
  • 13
  • 122
  • 270

AIC versus cross validation in time series: the small sample case

replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Source Link

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC is asymptotically equiv­a­lent to min­i­miz­ing the out-​​of-​​sample one-​​step fore­cast MSE for time series mod­els (according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found herehere, herehere and herehere.

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC is asymptotically equiv­a­lent to min­i­miz­ing the out-​​of-​​sample one-​​step fore­cast MSE for time series mod­els (according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC is asymptotically equiv­a­lent to min­i­miz­ing the out-​​of-​​sample one-​​step fore­cast MSE for time series mod­els (according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

Tweeted twitter.com/StackStats/status/669370241236017152
added 117 characters in body
Source Link
Richard Hardy
  • 68.6k
  • 13
  • 122
  • 270

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC and leaveis asymptotically equiv­a­lent to min­i­miz­ing the out-one​​of-out cross validation are asymptotically equivalent under some assumptions​​sample one-​​step fore­cast MSE for time series mod­els (Stone, 1977)(according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC and leave-one-out cross validation are asymptotically equivalent under some assumptions (Stone, 1977), I doubt this is relevant here...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, ..., 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC is asymptotically equiv­a­lent to min­i­miz­ing the out-​​of-​​sample one-​​step fore­cast MSE for time series mod­els (according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large...

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

Source Link
Richard Hardy
  • 68.6k
  • 13
  • 122
  • 270
Loading