2
$\begingroup$

In a test I had to derive the posterior of the multinomial distribution with the conjugate Dirichlet prior. I used common relation $$p(\mu|X;\alpha) \propto P(X|\mu) P(\mu|\alpha).$$ I did, however, assume $X$ is a single random variable, not a data set. This led me to conclude that the posterior can be written as Dirichlet with parameter $\alpha^*=\alpha+x$, where $\alpha$ and $x$ of dimension $K$ (classes). On the Wikipedia entry for prior giving the posterior parameterizations, all distributions are given for a sample of $n$ data points, hence $\alpha^*=\alpha+ \sum_{i=1}^{n}x$. Is my solution still correct if I want to show that the posterior is a Dirichlet and what's its parameter. More generally I am unsure if posterior distributions are only defined for $n$ data points $X$ (as the Wikipedia entry implies) or can be derived for single $X$ as well.

$\endgroup$
7
  • $\begingroup$ Setting other things aside: if something is defined for $n$ points, what exactly is the problem with $n=1$ ..? Your question is not really clear, but you can apply Bayes theorem to single point, or to multiple points, the same as you could use least squares estimation to find the best parameter given single point (but you won't learn anything revealing...). $\endgroup$
    – Tim
    Commented Apr 25, 2017 at 21:20
  • $\begingroup$ @Tim It seems that for showing it is a conjugate prior, it is enough to do this with $n=1$? $\endgroup$
    – tomka
    Commented Apr 25, 2017 at 21:33
  • $\begingroup$ Why shouldn't it? $\endgroup$
    – Tim
    Commented Apr 25, 2017 at 21:53
  • 1
    $\begingroup$ Check stats.stackexchange.com/questions/237037/… $\endgroup$
    – Tim
    Commented Apr 26, 2017 at 7:19
  • 1
    $\begingroup$ You can do it all-at-once or sequentially, it will be the same. $\endgroup$
    – Tim
    Commented Apr 26, 2017 at 8:08

1 Answer 1

0
$\begingroup$

To show that Dirichlet is conjugate prior to multinomial, it is indeed sufficient to use one $X$. For estimation purposes, however, a all-in-one procedure would factor over $n$ independent samples, yielding the Wikipedia entry, or would use a sequential updating step repeatedly with one (randomly selected) sample of all $n$ samples. In the updating the Dirichlet prior hyper-parameter would change from the initial $\alpha$ by adding observations $x_i$ repeatedly. The two procedures are equivalent. The parameters of the posterior thus depend on the purpose.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.