0
$\begingroup$

I made a function that implements the clustering algorithm in the research article "Clustering compositional data using Dirichlet mixture model" (2022). I am now trying to figure out which model to choose for my data (Dirichlet distributed, sample size of 250, and I do not have the true cluster assignments). I thought of calculating the BIC but I'm still wondering on what is the number of parameters in this case. I'm thinking that it is k + k * p (where k is the number of clusters, and p is the dimension of the model). The initial k comes from the estimated mixture proportions (so 1 for each cluster), and the k * p since for one Dirichlet model we have p parameters, and so since we are fitting k Dirichlet models then k * p. Is this right? Is the AIC/BIC even a relevant measure in this setting? Is there another metric relevant to this setting to evaluate the quality of the models for the goal of choosing one of them?

$\endgroup$

0