Skip to main content

confusion Confusion about the optimized parameters when doing maximum likelihood

Source Link
rando
  • 308
  • 1
  • 10

confusion about the optimized parameters when doing maximum likelihood

When I do ML estimation, I always get confused about whether I should $\max_\theta\prod P(D | \theta)$ or $\max_\theta\prod P(\theta | D)$, where $D=\{x_1,x_2,\dots,x_N\}$ and $\theta$ are the parameters of the distribution. I know that in the first case, what I am doing is maximizing the likelihood, and in the latter case, I'm maximizing the posterior. However, it appears to me that these are just semantics. For instance, in the latter case, I can assume that the data is coming from different Gaussian distributions where each has a mean $x_i$ and a known standard deviation $\sigma$, which gives the following:

$$\hat\theta=\max_\theta\prod P(\theta | D) = \max_\theta \prod \frac{\exp^{-|\theta - x_i|^2/2\sigma^2}}{\sqrt{2\pi\sigma^2}}$$

Technically, this will give the same result as if I do maximum likelihood. Now, when predicting for a new instance $x_j$, I can calculate the probability that $\hat\theta$ comes from a Gaussian distribution with a mean $x_j$. What is wrong in that?