Suppose you have a linear model which you believe has too many variables -- a cubic in 10 lags, for example. You believe, without being certain, that it is probably quadratic, and maybe linear, and that only four to six of the lags matter, and that, as a result, your current equation probably does rampant overfitting. Your goal is a good predictive model.
Relaxed LASSO, in at least some of its variants, uses the LASSO for variable selection and then switches to ridge regression for shrinkage. This generally does a better job of forecasting than pure LASSO or elastic net. I am not certain how its out-of-sample performance compares to pure ridge regression, but I believe it is worse, but not much worse, after optimal selection of the tuning parameter by some version of cross-validation in each case.
Here is my question: When comparing models that differ only in the variables they include, variable selection (as by LASSO selecting variables with non-zero coefficients) and model selection (as by choosing the set of variables that minimises the AIC) are pragmatically doing the same job, however conceptually distinct they may be. Now suppose in each of those two variable selection cases (via LASSO or AIC) we then do ridge regression on the resulting model, as in relaxed LASSO. Suppose the variables selected by AIC and LASSO are not identical. Do we know which case, i. e., which selection of variables, is likely to do better out-of-sample forecasting after correct setting of the tuning parameter for each?