8
$\begingroup$

I'm performing all possible model selection in SAS for time series forecasting and basically fitting 40 models on the data and shortlisting the 'n' best models based on selection criteria.

What criteria should I use to shortlist?

From what I've read, SBC and AIC select the most parsimonious models or the ones with the least parameters (due to penalties applied). But if I'm fitting a time series model, I only have 1 independent variable or x, that is TIME.

Also I read somewhere that RMSE is highly susceptible to outliers.

$\endgroup$

2 Answers 2

9
$\begingroup$

The short answer is that there is no silver bullet. The few selection criteria you have named are also by far not all there are (as I am sure you are aware of).

So let us start with the ones that are most commonly used for time series applications: the Bayesian-Schwarz Criterion (BIC), the Akaike Criterion (AIC), and the Hannan-Quinn Criterion (HQC). The way these model selection criteria are used is to select the lag length of your model (i.e., how many periods of the past affect the present period).

These three criteria are estimating the Kullback-Leibler divergence of your data and asymptotically select a true model. Notice how I said 'a' true model, because including superfluous lags asymptotically makes no difference (since asymptotically, they will be estimated to be zero). It is noteworthy that AIC asymptotically selects a true model that strictly overfits, i.e. a model that is larger than the smallest true model. In Machine Learning terminology, it is prone to overfitting. BIC and HQC on the other hand select the smallest true model asymptotically. They have the drawback of underselecting in finite samples, which is why AIC is often preferred in applications.

The main problem with (unpenalized) RMSE is that extending the lag length (i.e., including more lags as explanatory variables) will always yield a better value for RMSE. This is so because the fit will not get worse by including more explanatory variables, and RMSE is a direct measure of fit.

I don't know your exact application, but I feel like many practitioners would go with comparing the optimum using AIC, BIC, and HQC and justifying their chosen lag length that way.

$\endgroup$
2
$\begingroup$

Whether it makes sense to create a short-list of just n models or to select a single model depends a lot on how much the data favors the "best" (according to a chosen criterion) model.

If a lot of models are "close together", then all of them are plausible models and selecting a single one or a subset of these when you have considered a lot of models is pretty problematic. For that reason model averaging is often a good idea and for prediction tasks weights for each model based on $\text{prior weights} \times \exp\{ -0.5 |\text{AIC}_i - \min_j \text{AIC}_j| \}$ are popular. I guess if you were to consider then when one model or a small number of models get nearly all the weight (e.g. the best model is ahead of the next model in terms of AIC by, say, 10 to 15 or so and not too many models were considered), then it is probably reasonable to concentrate on those. AIC based weights are popular for prediction, because of the link between these and maximum likelihood estimation. For a very enthusiastic view of this type of approach, you could refer to Burnham and Anderson's "Model selection and multimodel inference".

As mentioned in the other response naive RMSE is not a justifiable option (although one could get a version adjusted for overfitting by using e.g. cross-validation), but other criteria (e.g. BIC etc.) are also potentially interesting although my personal bias is towards AIC type of approaches.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.