Recently, I came across the Healthy Akaike Information Criterion (hAIC), introduced by Demidenko in his 2004 book "Mixed Models: Theory and Applications with R." Despite its (potential) advantages, I have found very few references to it in the literature.
Some background: The Healthy Akaike Information Criterion (hAIC) was developed by Demidenko to address the limitations of the traditional Akaike Information Criterion (AIC) in the presence of high multicollinearity among explanatory variables. While AIC is calculated as:
$$ AIC = −2\log(\text{max})+2k $$
where $\log(\text{max})$ is the maximum log-likelihood and $k$ is the number of parameters, hAIC modifies this by incorporating a penalty term that accounts for the* length of* the parameter vector, which Demidenko claims makes hAIC particularly useful in scenarios with ill-posed problems and/or highly correlated predictors.
The hAIC formula is: $$ HAIC = H + AIC $$ where $$ H = k \left[ \log\left(\frac{\|\beta_{\text{ls}}\|^2}{k}\right) - 1 \right] $$ Here, $\beta_{ls}$ represents the least squares estimates of the parameters and the norm, $\|\beta_{\text{ls}}\|$ is the Euclidean length of the parameter vector: $$ \|\beta_{\text{ls}}\| = \sqrt{\sum_{i=1}^k (\beta_{\text{ls},i})^2} $$ This penalty term is designed to penalise models with large parameter estimates, which are indicative of multicollinearity. By incorporating the norm of the parameter vector, hAIC ensures that models with excessively large coefficients are penalised, promoting more stable and reliable estimates. Excessive multicollinearity can lead to ill-posed problems where the model matrix is nearly singular, resulting in large variances in parameter estimates. The additional penalty term helps in mitigating this issue by favouring models with smaller, more stable parameter estimates (how we can distinguish between large parameter estimates due to multicollinearity and genuinely large parameter effects has not been addressed in what I have read so far). Traditional AIC only considers the number of parameters, not their magnitudes. hAIC, on the other hand, integrates both the number and the magnitude of parameters, aiming for a more comprehensive approach to model selection that considers the overall stability and reliability of the model.
The only published research that I found at all is by Harezlak et al (2007) in a book titled "Penalized solutions to functional regression problems." In this study, hAIC was used to enhance model stability and interpretability by combining error variance estimates, degrees of freedom, and the norms of basis function coefficients.
So, I am wondering if anyone else here has used hAIC, or seen it used, or even heard of it before ? If so, what are your thoughts and experiences of it? If not, what are your thoughts on what I have presented here ?
For added context, here are some pros and cons of AIC, hAIC and BIC that I thought of:
Penalty Structure:
- AIC: Penalty term is $2k$, focusing on the number of parameters.
- BIC: Penalty term is $k\log(n)$, increasing with the number of observations and providing a stronger penalty for additional parameters.
- hAIC: Penalty term includes both the number of parameters and the norm of the parameter vector, addressing the magnitude of parameter estimates as well.
Model Selection:
- AIC: Often selects models that are more complex due to a lower penalty per parameter.
- BIC: Tends to favour simpler models, especially as the sample size increases, due to the stronger penalty.
- hAIC: Aims to balance the trade-off between model fit and complexity while specifically addressing issues of multicollinearity and parameter instability.
Application Scenarios:
- AIC: Suitable for model selection when the primary concern is prediction accuracy and the number of observations is not excessively large.
- BIC: Preferred in contexts where the true model is believed to be among the candidate models and the sample size is large.
- hAIC: Particularly useful in scenarios with high multicollinearity or ill-posed problems, where traditional AIC might fail to provide stable and reliable model selection.
X
orscale(X)
? $\endgroup$