I will use an elastic net to estimate a regression model which will later be used for forecasting.
I have a grid of $\alpha$ values within [0,1] representing the proportion of $L_1$ versus $L_2$ penalty.
I also have a grid of $\lambda$ values for the amount of penalization.
There are at least two alternatives for selecting the optimal combination $(\alpha,\lambda)$:
- Perform leave-one-out cross validation (LOOCV) to see which combination $(\alpha,\lambda)$ delivers the lowest MSE on the validation sets (and maybe use the one-sigma rule towards parsimony).
- Use the whole sample to see which combination $(\alpha,\lambda)$ delivers the lowest AIC.
In the second alternative, the degrees of freedom used in AIC would be based on the effective degrees of freedom of an elastic net. (I suppose the latter should be possible to obtain as the effective degrees of freedom are known for both LASSO and ridge regression.)
Question: Which of 1. and 2. is better and why?
Some thoughts:
- In the context of feature selection, LOOCV is known to be asymptotically equivalent to AIC-based selection. So asymptotically I would expect both 1. and 2. to yield the same result. But what about finite samples?
- Alternative 2. could be preferred due to speed.
- Alternative 2. requires specifying the error distribution.
- Is it fine to use effective degrees of freedom when calculating AIC?
alpha=0.5
when fitting an elastic net model, and use cross validation only to select $\lambda$. Searching over a grid can lead to overfitting even if your using cross validation. Nevertheless, there is an interval-search algorithm implemented in the c060 package will select the optimal parameter combination. $\endgroup$