9
$\begingroup$

Recently, I came across the Healthy Akaike Information Criterion (hAIC), introduced by Demidenko in his 2004 book "Mixed Models: Theory and Applications with R." Despite its (potential) advantages, I have found very few references to it in the literature.

Some background: The Healthy Akaike Information Criterion (hAIC) was developed by Demidenko to address the limitations of the traditional Akaike Information Criterion (AIC) in the presence of high multicollinearity among explanatory variables. While AIC is calculated as:

$$ AIC = −2\log(\text{max})+2k $$

where $\log(\text{max})$ is the maximum log-likelihood and $k$ is the number of parameters, hAIC modifies this by incorporating a penalty term that accounts for the* length of* the parameter vector, which Demidenko claims makes hAIC particularly useful in scenarios with ill-posed problems and/or highly correlated predictors.

The hAIC formula is: $$ HAIC = H + AIC $$ where $$ H = k \left[ \log\left(\frac{\|\beta_{\text{ls}}\|^2}{k}\right) - 1 \right] $$ Here, $\beta_{ls}$ represents the least squares estimates of the parameters and the norm, $\|\beta_{\text{ls}}\|$ is the Euclidean length of the parameter vector: $$ \|\beta_{\text{ls}}\| = \sqrt{\sum_{i=1}^k (\beta_{\text{ls},i})^2} $$ This penalty term is designed to penalise models with large parameter estimates, which are indicative of multicollinearity. By incorporating the norm of the parameter vector, hAIC ensures that models with excessively large coefficients are penalised, promoting more stable and reliable estimates. Excessive multicollinearity can lead to ill-posed problems where the model matrix is nearly singular, resulting in large variances in parameter estimates. The additional penalty term helps in mitigating this issue by favouring models with smaller, more stable parameter estimates (how we can distinguish between large parameter estimates due to multicollinearity and genuinely large parameter effects has not been addressed in what I have read so far). Traditional AIC only considers the number of parameters, not their magnitudes. hAIC, on the other hand, integrates both the number and the magnitude of parameters, aiming for a more comprehensive approach to model selection that considers the overall stability and reliability of the model.

The only published research that I found at all is by Harezlak et al (2007) in a book titled "Penalized solutions to functional regression problems." In this study, hAIC was used to enhance model stability and interpretability by combining error variance estimates, degrees of freedom, and the norms of basis function coefficients.

So, I am wondering if anyone else here has used hAIC, or seen it used, or even heard of it before ? If so, what are your thoughts and experiences of it? If not, what are your thoughts on what I have presented here ?

For added context, here are some pros and cons of AIC, hAIC and BIC that I thought of:

Penalty Structure:

  • AIC: Penalty term is $2k$, focusing on the number of parameters.
  • BIC: Penalty term is $k\log(n)$, increasing with the number of observations and providing a stronger penalty for additional parameters.
  • hAIC: Penalty term includes both the number of parameters and the norm of the parameter vector, addressing the magnitude of parameter estimates as well.

Model Selection:

  • AIC: Often selects models that are more complex due to a lower penalty per parameter.
  • BIC: Tends to favour simpler models, especially as the sample size increases, due to the stronger penalty.
  • hAIC: Aims to balance the trade-off between model fit and complexity while specifically addressing issues of multicollinearity and parameter instability.

Application Scenarios:

  • AIC: Suitable for model selection when the primary concern is prediction accuracy and the number of observations is not excessively large.
  • BIC: Preferred in contexts where the true model is believed to be among the candidate models and the sample size is large.
  • hAIC: Particularly useful in scenarios with high multicollinearity or ill-posed problems, where traditional AIC might fail to provide stable and reliable model selection.
$\endgroup$
9
  • 2
    $\begingroup$ This is a very helpful writeup of the hAIC and the differences between it and the other IC we are familiar with in the mixed effects world. I have never heard of it before, but it looks like it could be useful. The main problem will be most readers' and users' unfamiliarity with it. So you would have to explain it every time you employ it. That's not a reason to avoid it, just a fact of life until it becomes more widely employed. $\endgroup$
    – Erik Ruzek
    Commented Jun 8 at 20:10
  • 2
    $\begingroup$ Regarding application scenarios and sample size, why would you say BIC is for large sample but AIC for not so large? The optimality properties of both criteria are derived asymptotically, thus, for large samples. Also, I would challenge the idea that BIC is preferred in contexts where the true model is believed to be among the candidates. Choice between AIC and BIC should be determined by what properties you want the selected model to have (best in prediction vs. closest to the DGP), not what the candidates are. (I would not mind using BIC even if the true model is not among the candidates.) $\endgroup$ Commented Jun 9 at 7:37
  • 1
    $\begingroup$ @RichardHardy you make a good point. Claeskens & Hjort (2008) discuss the properties of BIC, emphasizing its consistency. They explain that BIC's penalty term ($k \log{n}$) increases with sample size, which helps in asymptotically selecting the true model if it is among the candidates. This makes BIC preferable in large sample contexts where model simplicity and accuracy in identifying the true model structure are important. Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge books. ideas.repec.org/b/cup/cbooks/9780521852258.html $\endgroup$ Commented Jun 10 at 11:51
  • 1
    $\begingroup$ Unless I missed something, it seems that the scale of the predictors will affect the hAIC. For example, will the hAIC of a model be the same if we work with X or scale(X)? $\endgroup$ Commented Jun 11 at 8:43
  • 2
    $\begingroup$ Yes, but do you want a criterion to give you two different values for essentially the same model? If you work with the original predictors or the standardized ones, the AIC/BIC will be the same, at least in simple models, in which standardization of the predictors is a simple reparameterization. $\endgroup$ Commented Jun 11 at 9:03

0