1
$\begingroup$

When I do ML estimation, I always get confused about whether I should $\max_\theta\prod P(D | \theta)$ or $\max_\theta\prod P(\theta | D)$, where $D=\{x_1,x_2,\dots,x_N\}$ and $\theta$ are the parameters of the distribution. I know that in the first case, what I am doing is maximizing the likelihood, and in the latter case, I'm maximizing the posterior. However, it appears to me that these are just semantics. For instance, in the latter case, I can assume that the data is coming from different Gaussian distributions where each has a mean $x_i$ and a known standard deviation $\sigma$, which gives the following:

$$\hat\theta=\max_\theta\prod P(\theta | D) = \max_\theta \prod \frac{\exp^{-|\theta - x_i|^2/2\sigma^2}}{\sqrt{2\pi\sigma^2}}$$

Technically, this will give the same result as if I do maximum likelihood. Now, when predicting for a new instance $x_j$, I can calculate the probability that $\hat\theta$ comes from a Gaussian distribution with a mean $x_j$. What is wrong in that?

$\endgroup$

1 Answer 1

1
$\begingroup$

The acronym MLE refers to maximum likelihood estimation and involves maximising the likelihood function, not the posterior distribution. The latter is called maximum a posteriori estimation. (The shorter acronym ML refers in this context to "maximum likelihood".) This is not a mere semantic difference, since the posterior incorporates information from a prior distribution and so the two estimators will generally be different.

The example you give of a purported posterior distribution is not constructed correctly. In particular, you have not specified a prior distribution for the parameters (that does not depend on the data) and you have not applied Bayes' rule. Instead you seem to have just used the likelihood function for the data as the purported posterior distribution (leading you to the conclusion that there is no difference between these things).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.