4
$\begingroup$

I am working with models that use Dirichlet distributions. However, I want to account for correlations between components. If this question is a duplicate, I'd also appreciate any pointers to the right direction. I have seen this question on "Distributions on the simplex with correlated components", but it didn't really answer what I am looking for.

So, say I have a Dirichlet model to estimate the colors of a jar of beans (red, blue, green, yellow) based on a sample: $n$ is the number of beans sampled, $k=4$ is the number of dimensions/components in the Dirichlet simplex (i.e., the number of colors), $x=(x_i,...,x_n)$ is a vector of $k$ dimensions of the bean colors, and $p_k=1/n\sum_{x=1}^{n}(x_i)$ is the proportion of bean colors. Suppose we have the following concentration vector $\alpha_k$ to model the proportion of bean colors, such as: $$p_k \sim Dirichlet(\alpha_k)$$

Suppose we have a very simple concentration vector, where $\alpha_k=(1,1,1,1)$, meaning that there is an equal chance for the beans to be each color (25%). However, suppose we know with a very high degree of certainty that 25% of the beans are red and 25% are blue, but we are not so certain of the proportion of green and yellow beans. (They also have an estimated proportion of 25% each, but there is a good chance they are 20% and 30%, or 35% and 15%.) In this sense, the probability functions for green and yellow are correlated: the error in one dimension affects one dimension more than others.

My question is how can correlated components be included in a Dirichlet distribution, as in the case described above? My problem is that a basic Dirichlet distribution assumes that the probability functions are independent (one bean not being one color has an equal probability of being any other color).

At first I thought that multinomial Dirichlet distributions could account for this because their they include a covariance. But I believe the resulting distributions still assume independence between components matrix, as pointed out in this video (1:30). I also found this article, "A generalization of the Dirichlet distribution" by Ongaro and Migliorati (2012) that proposes flexible Dirichlet distributions, as well as the FlexDir CRAN package. Could this generalization address my question?

Edit: One clarification as an addendum to the question. The concentration vectors of $\alpha$ can be increased to reduce the variance of the probability distribution. This would reflect higher "confidence" in the estimations. However, the problem is that this has to be done proportionally for all vector or else the mean estimations of the Dirichlet distribution will change. For example, $\alpha_k=(4,4,4,4)$ would reflect high confidence in the estimations, but for all dimensions equally. Meanwhile, $\alpha_k=(4,4,1,1)$ would make the estimates for red and blue more confidence, but it would also would change the proportions so that the mean estimated probability of green and yellow are 10% for each.

$\endgroup$
1
  • 2
    $\begingroup$ Why not model $\vec \alpha := \exp (\vec W)$ where $\vec W$ is a random multivariate normal vector? The exponential function would be applied elementwise. $\endgroup$
    – Galen
    Commented Jul 31, 2023 at 9:15

2 Answers 2

7
$\begingroup$

My question is how can correlated components be included in a Dirichlet distribution, as in the case described above?

One approach is to assume that the elementwise log of the Dirichlet parameter vector is a random vector following a multivariate normal distribution.

$$X \sim \text{Categorical}(\vec p)$$ $$\vec p \sim \text{Dirichlet}(\vec \alpha)$$ $$\log \vec \alpha \sim \text{MvNormal}(\vec \mu, \Sigma)$$


If you wanted to put prior information on the parameters of the normal distribution you could try something similar to this:

$$X \sim \text{Categorical}(\vec p)$$ $$\vec p \sim \text{Dirichlet}(\vec \alpha)$$ $$\log \vec \alpha \sim \text{MvNormal}(\vec \mu, \Sigma)$$ $$\vec \mu \sim \mathcal{N}(\vec 0, \vec 1)$$ $$\Sigma := \left( \vec \sigma \otimes \vec \sigma \right) \odot \mathbf{R}$$ $$\vec \sigma \sim \text{Exponential}(\vec 1)$$ $$\mathbf{R} \sim \text{LKJCorr}(0.5)$$

And there are yet other approaches that use Wishart, or inverse Wishart, to put priors on the covariance.


A simplification of the previous model that you could try is to replace $\vec \sigma$ with a single scalar parameter $\sigma$.

$$X \sim \text{Categorical}(\vec p)$$ $$\vec p \sim \text{Dirichlet}(\vec \alpha)$$ $$\log \vec \alpha \sim \text{MvNormal}(\vec \mu, \Sigma)$$ $$\vec \mu \sim \mathcal{N}(\vec 0, \vec 1)$$ $$\Sigma := \sigma \mathbf{R}$$ $$\sigma \sim \text{Exponential}(1)$$ $$\mathbf{R} \sim \text{LKJCorr}(0.5)$$


And indeed there are many simplifications and complications to the model that you could try. A common, and I think good, piece of advice is to start simple and work your way towards complicated. Often you'll find a model where all of its complications don't seem to be improvements.

$\endgroup$
8
  • $\begingroup$ Would this affect the mean estimated probability $p_k$ for each component? I have added an edit in the question to clarify this potential issue. $\endgroup$ Commented Jul 31, 2023 at 9:44
  • $\begingroup$ @MarcoPastorMayo We can take $\mathbb{E}[\vec \alpha] = \exp \vec \mu$, so it can affect the mean estimated probabilities $\vec \alpha$. If you instead need a fixed mean you can fix w/e components of $\vec \mu$ you like. The statistical dependence is determined by the variance/covariance matrix $\Sigma$. $\endgroup$
    – Galen
    Commented Jul 31, 2023 at 9:49
  • 2
    $\begingroup$ @MarcoPastorMayo as Galen pointed out that you need not a distribution, but a whole probabilistic model (+1). Moreover, you seem to be saying that some of the components are unknown but you have some prior knowledge on them, what sounds like a Bayesian statistical model--possibly similar to what is described in the answer. $\endgroup$
    – Tim
    Commented Jul 31, 2023 at 10:06
  • $\begingroup$ @Galen I think I understand what you mean. The $\alpha$ values would themselves be probability distributions rather than having a fixed value, correct? Could you give an example of this process? Would this then use multinomial Dirichlet distributions, flexible Dirichlet distributions, or neither? $\endgroup$ Commented Jul 31, 2023 at 10:21
  • 1
    $\begingroup$ @MartinModrák Alas my answer is a quickly-conceived first-approximation. Sometimes these models train smooth as butter and sometimes very poorly. I've found Gelman et al 2020's Bayesian workflow to be a useful guide for iterative improvement. $\endgroup$
    – Galen
    Commented Jul 31, 2023 at 18:15
5
$\begingroup$

I think the model in the answer by Galen is reasonable, if you insist on having a Dirichlet distribution somewhere in your model.

If you however don't insist on a particular form, just want to model the correlations across different realizations of $X$ I think a simpler model could be enough. For example one could have:

$$ X \sim \text{Categorical}(\vec p) \\ \vec p = \text{softmax} \begin{pmatrix}0 \\ \vec q \end{pmatrix} \\ \vec q \sim MVN(\vec\mu, \Sigma) $$

Where $\vec q$ has one less dimension than $X$. The undesirable property is that now the model is not (I believe) invariant to the choice a reference category (the one where you put the extra zero)...

$\endgroup$
1
  • 3
    $\begingroup$ This is a simple alternative (+1). My advice is the OP is to try both and use a model comparison strategy to adjudicate which is the better model for their data (and estimand). $\endgroup$
    – Galen
    Commented Jul 31, 2023 at 18:20