Is the Cross Validation Error more "Informative" compared to AIC, BIC and the Likelihood Test?
As far as I understand:
The Likelihood Test is used to determine : Given some data, is some fitted statistical model (i.e. some specific model parameters) more "likely" to be observed compared to some "alternate statistical model"? (In practice, this "alternate statistical model" often takes the form of the former model, but where all model parameters are 0). This is often formulated as a "hypothesis test", with the Likelihood Test Statistic said to asymptotically follow a Chi-Square Distribution (Chi square approximation of the likelihood test ratio)
The AIC and BIC both provide an indication of the performance of a fitted statistical model compared to the "complexity" of this fitted statistical model. Both the AIC and BIC loosely convey the same idea as "Occam's Razor" - (a concept in philosophy that states) Given the choice of a simpler model (a model with fewer parameters) and a complex model (a model with more parameters) : Provided both models provide the same performance, the simpler model is more preferable. This also ties into the idea of overfitting (traditionally, it was thought that models with good performance but with many parameters are likely to overfit and poorly predict new data, i.e. Bias-Variance Tradeoff). "Better" models are said to have lower values of AIC and BIC - yet there is no statistical threshold on "how low", only a relative comparison (e.g. Model 1 AIC = 234,841 and Model 2 AIC = 100,089 : Is Model 2 significantly better than Model 1, or are both models no where near acceptable? )
On the other hand, Cross Validation (e.g. K-Fold Cross Validation, Leave One Out Cross Validation) is said to be able to see how badly a statistical model is overfitting the available data - if the statistical model overfits the available data, (on a heuristic level) it is thought that this statistical model is likely to poorly predict new data. Cross Validation fits a series of similar statistical models to randomly selected subsets of the available data - the model error is recorded on each subset, and the average model error (i.e. performance , e.g. MSE, F-Score, Accuracy) is recorded over all subsets (Cross Validation Error). Thus, we can obtain similar insights about our statistical model from Cross Validation as we can from the Likelihood Test and the AIC/BIC.
This leads me to my question: Is the Cross Validation Error more "Informative" compared to AIC, BIC and the Likelihood Test?
Here are my general thoughts :
1) When you have large datasets and statistical models with many parameters (e.g. Deep Neural Networks), the Cross Validation procedure can be very computationally expensive (i.e. thousands of models might have to be fit). 50 years ago when computers were weaker, it might not have been possible to perform Cross Validation on statistical models - whereas the Likelihood Test, AIC and BIC are less computationally expensive. Thus, originally, researchers might have favored the Likelihood Test, AIC and BIC over Cross Validation.
2) AIC and BIC are only interpreted in relative measures, e.g. Model 1 AIC = 234,841 and Model 2 AIC = 100,089 : Is Model 2 significantly better than Model 1, or are both models no where near acceptable? On the other hand, you can perform Cross Validation on a simple model vs. a complex model (e.g. regression model with 3 parameters vs 5 parameters) and measure the Cross Validation Error of both models. In essence, this should allow you to compare model complexity vs. model performance - similar to the information that the AIC and BIC provide.
3) When it comes to Inference based Models, it becomes conceptually difficult to implement Cross Validation.
For example, suppose instead of fitting a regression model to your data - you decide to fit an entire probability distribution to your data. Instead of the model parameters being regression coefficients beta-0, beta-1, beta-2, etc., the model parameters are now the means, variances and covariances of the different variables (e.g. a multivariate normal distribution) :
Probability Distributions are more informative than Regression Models : Suppose you want to predict age of a giraffe using weight and height.
A regression model would only allow you to predict age for different combinations of weight and height; and provide confidence intervals on the parameter estimates of weight and height.
A probability distribution (i.e. inference based model) would also allow you to predict the age for different combinations of weight and height - but in addition would allow you to answer more in-depth questions such as "what is the most probable weight of a giraffe that is 20 years old and 15ft tall?" (expectation of the conditional distribution via MCMC sampling) or "what is the probability of observing a giraffe that weighs more less than 500 lbs" (marginal probability distribution)?
I suppose in theory, the Cross Validation procedure could be created for measuring the error of probability distribution models (fit a probability distribution on 70% of the data, and for each measurement in the test set (30%): see how close the expected value of the conditional distribution is from the true measurement ... then repeat "k" times). But generally, the Likelihood Test is used more often to assess the fit of a probability distribution function given some data.
Are my conclusions somewhat correct? Are there some instances where Cross Validation proves to be more informative compared to AIC, BIC and the Likelihood Test (and vice-versa)?
Thanks!
References:
Note: I have never come across any performance metric (e.g. AIC, BIC, Likelihood Test) which allows you to determine the error of statistical models such as Gaussian Process Regression or Gaussian Process Regression. I always imagined that perhaps manually creating a Cross Validation loop would be the only way to measure the error/overfit of Gaussian Process Models.