3
$\begingroup$

Assuming a normal likelihood function is used for a maximum likelihood estimate of $\mu$, how can I prove that the maximum likelihood estimate of $\mu$ actually provide a maximum likelihood estimate?

The following is what I have done to get a maximum likelihood estimate of $\mu$ for a multivariate normal distribution:

$$\tag{1}\text{Likelihood function}$$ $$f(X_1, X_2, \dots, X_n\mid\mu, \Sigma) = \prod_{j=1}^n \left\{\dfrac 1 {(2\pi)^{p/2} |\Sigma|^{1/2}} e ^{-(x_j-\mu)^{\text{T}}\Sigma^{-1}(x_j-\mu)/2} \right\}$$

$$\tag{2} \text{Log likelihood}$$ $$\ln f(X_1, X_2, \dots, X_n\mid \mu, \Sigma) = \ln \prod_{j=1}^n \left\{\frac 1 {(2\pi)^{p/2}|\Sigma|^{1/2}} e ^{-(x-\mu)^\text{T}\Sigma^{-1}(x-\mu)/2} \right\}$$

$$\tag{3} \text{Differentiated the log-likelihood function}$$ $$= \frac{np}{2} \log 2\pi + \frac n 2 \log |\Sigma| + \frac 1 2 \sum_{i=1}^n (x_i-\mu)^T \Sigma^{-1}(x_i-\mu) $$

$$\frac{\partial \ln \ell(\mu, \Sigma)}{\partial \mu}=\frac 1 2 \sum_{i=1}^n 2\Sigma^{-1}(\mu-x_i) = \Sigma^{-1} \sum_{i=1}^n(\mu-x_i)=0 $$

Did some algebra to get $\mu$

$$\Sigma^{-1}\sum_{i=1}^n(\mu-x_i) = 0$$

$$n\mu-\sum_{i=1}^nx_i = 0$$

$$n\mu=\sum_{i=1}^nx_i$$

$$\mu=\frac 1 n \sum_{i=1}^nx_i$$

$$\mu^*_\text{MLE}=\frac 1 n \sum_{i=1}^ n x_i$$

$\endgroup$
7
  • $\begingroup$ Please show your workings up to the point where you get stuck. $\endgroup$ Commented Mar 13, 2017 at 12:43
  • $\begingroup$ Added what I have done. $\endgroup$
    – user122358
    Commented Mar 13, 2017 at 13:20
  • 1
    $\begingroup$ Thanks! I forgot to say to add the self-study tag - I've done it. It's perhaps not clear though, exactly where your doubt lies. $\endgroup$ Commented Mar 13, 2017 at 13:50
  • 2
    $\begingroup$ Seems like you've shown $\frac{1}{n} \sum_i x_i$ is the MLE. What more do you want? The only thing perhaps missing is a recognition that the normal distribution density function is logarithmically concave in $\mu$ hence finding where $\frac{d \ln f}{d \mu} = 0$ is a necessary and sufficient condition for the maximum of the log likelihood. Furthermore, it maximizes the likelihood since the logarithm is a monotonic transformation. $\endgroup$ Commented Mar 13, 2017 at 15:27
  • 3
    $\begingroup$ When you write \text{ln} instead of \ln, you don't automatically get proper spacing in things like $a\ln b$ or $a\ln(b)$. And \text{ln l} is very strange notation and I replaced it. You seem to alternate between the synonymous usages $\ln$ and $\log. \qquad$ $\endgroup$ Commented Mar 13, 2017 at 16:02

1 Answer 1

4
$\begingroup$

$$ \prod_{j=1}^n \left\{\dfrac 1 {(2\pi)^{p/2} |\Sigma|^{1/2}} e ^{-(x_j-\mu)^{\text{T}} \Sigma^{-1}(x_j-\mu)/2} \right\} = \frac 1 {\left( (2\pi)^{p/2} |\Sigma|^{1/2} \right)^n} e^{-\sum_{j=1}^n (x_j-\mu)^\text{T} \Sigma^{-1} (x_j-\mu)} $$ This expression depends on $\mu$ only through $$ \sum_{j=1}^n (x_j-\mu)^\text{T} \Sigma^{-1}(x_j-\mu). \tag{a} \label{sum} $$ Therefore the problem is just that of finding the value of $\mu$ that minimizes the sum (\ref{sum}).

Let $\bar x = \sum_{j=1}^n x_j.$ Then \begin{align} & \sum_{j=1}^n \Big((x_j-\bar x)+(\bar x - \mu) \Big)^\text{T} \Sigma^{-1} \Big((x_j-\bar x)+(\bar x - \mu) \Big) \\[10pt] = {} & \sum_{j=1}^n (x_j-\bar x)^\text{T} \Sigma^{-1} (x_j-\bar x) + \sum_{j=1}^n (x_j-\bar x)^\text{T} \Sigma^{-1} (\bar x - \mu) \\ & {} + \sum_{j=1}^n (\bar x - \mu)^\text{T} \Sigma^{-1} (x_j - \bar x) + \sum_{j=1}^n (\bar x-\mu)^\text{T} \Sigma^{-1} (\bar x - \mu) \\[10pt] = {} & \sum_{j=1}^n (x_j-\bar x)^\text{T} \Sigma^{-1} (x_j-\bar x) + 0 + 0 + \sum_{j=1}^n (\bar x-\mu)^\text{T} \Sigma^{-1} (\bar x - \mu) \\[10pt] = {} & \text{constant} + \sum_{j=1}^n (\bar x-\mu)^\text{T} \Sigma^{-1} (\bar x - \mu) \\ & (\text{where “constant'' means not depending on } \mu) \\[10pt] = {} & \text{constant} + n (\bar x - \mu)^\text{T} \Sigma^{-1} (\bar x - \mu) \\ & \qquad \text{ because all the $n$ terms in the sum are the same.} \end{align}

This last expression is $0$ if $\mu=\bar x$ but is positive if $\mu$ is anything else.

$\endgroup$