3
$\begingroup$

Suppose that we have the simple linear regression model for the form:

$$Y_i = \beta X_i +\varepsilon_i$$

With the following set of 'classical assumptions' holding:

  1. $E(\varepsilon_i)=0$

  2. $Var(\varepsilon_i) = {E}(\varepsilon_i^2)-{E}(\varepsilon_i)^2= {E}(\varepsilon_i^2) = \sigma^2$

  3. $Cov(\varepsilon_i, \varepsilon_j)=0$ for all $i\neq j$

  4. $\varepsilon_i$ is normal

  5. $X_i$ are constants, rather than random variables.

I want to find the maximum liklihood estimator for $\sigma^2$ assuming it is unknown, and the maximum liklihood estimator for $\beta$ assuming that $\sigma^2$ is unknown.

As background, assuming that $\sigma^2$ is known, I have the following showing the OLS estimator is the MLE estimator.

From this, we can see that the OLS estimator for $\beta$ is given by solving the following:

$$ \text{minimize s.t.} \beta: f(\beta) =\sum (\hat{\varepsilon_i}^2) = \sum (Y_i - \beta X_i)^2$$

We can easily see that as $\frac{df}{d\beta} = \sum (-2Y_iX_i + \beta^2 X_i^2)$ and $\frac{d^2f}{d\beta^2} = 2X_i^2 \geq 0$ then the OLS estimator for $\beta$ is given by:

$$\hat{\beta} = \dfrac{\sum X_iY_i}{\sum X_i^2}$$

We can easily show that this is also the maximum liklihood estimator. We start by looking at the liklihood function, defined as the joint probability density function of the $Y_i's$, and recalling that the assumption that $\varepsilon_i$ is normal implies that the $Y_i$ are normal. We have:

$$L(\beta) = \prod \frac{1}{\sqrt{2\pi\sigma^2}}\exp(\frac{-1}{2\sigma^2}(Y_i - \beta X_i)^2)$$

$$ \therefore L(\beta) \frac{1}{(2\pi \sigma^2)^{\frac{N}{2}}} \exp (\frac{-1}{2\sigma^2}\sum (Y_i - \beta X_i)^2$$

If we then consider the log-liklihood function, we get:

$$l(\beta) = \text{ln} \left [ \frac{1}{(2\pi \sigma^2)^{\frac{N}{2}}} \right ]\frac{-1}{2\sigma^2}\sum (Y_i - \beta X_i)^2$$

This is once again a fairly straightforward problem to maximise $l(\beta)$ which once again shows that $$\hat{\beta} = \frac{\sum Y_iX_i}{\sum X_i^2}$$

This is where I get a little confused. Am I to take the maximum liklihood function as: $$L(\beta,\sigma^2) = \prod \frac{1}{\sqrt{2\pi\sigma^2}}\exp(\frac{-1}{2\sigma^2}(Y_i - \beta X_i)^2)$$

and then find the maximum for this in terms of both $\sigma^2$ and $\beta$. Moreover, it's not clear to me where I actually used the assumption that $\sigma^2$ is known? This is obviously relevant when considering confidence intervals, hypothesis test etc. but it is not clear to me how $\sigma^2$ being known or unknow impacts the derivations above.

Thanks for any help,

Hmmm16

$\endgroup$
1
  • $\begingroup$ Your expressions for the likelihood and log-likelihood for $\beta$ seem to presume implicitly that $\sigma^2$ is known; otherwise you would call them the (log-)likelihood for $\beta$ and $\sigma^2$. $\endgroup$
    – Henry
    Commented May 28 at 18:24

2 Answers 2

3
$\begingroup$

You assumed $\operatorname{cov}(\varepsilon_i,\varepsilon_j)=0,$ which is weaker than actually assuming independence. If you assume independence, then the likelihood function is $$ L(\beta) = \cdots $$ That is one point where you assume $\sigma$ is known: the above is a function only of $\beta$ and not of $(\beta,\sigma).$ \begin{align} L(\beta) & = \prod_{i=1}^n \frac1{\sigma\sqrt{2\pi}} \exp\left( -\frac1{2\sigma^2} \left( Y_i-\beta X_i \right)^2 \right) \\[10pt] & \propto \prod_{i=1}^n \exp\left( -\frac1{2\sigma^2} \left( Y_i-\beta X_i \right)^2 \right) \end{align} Here is another place where you assume $\sigma$ is known: The symbol $\propto$ must be taken to mean proportional as a function of $\beta,$ not proportional as a function of $(\beta,\sigma).$ "Proportional" means it's just the other function times a constant. "Constant" must be taken to mean not changing as $\beta$ changes, NOT "not changing as $(\beta,\sigma)$ changes." \begin{align} = \exp\left( -\frac1{2\sigma^2} \sum_{i=1}^N(Y_i-\beta X_i)^2 \right). \end{align} Now we say: This gets bigger as $\sum_{i=1}^n (Y_i-\beta X_i)^2$ gets smaller. Therefore the problem is to find the value of $\beta$ that makes that sum of squares as small as possible. I.e. the least-squares estimator is in this case the maximum-likelihood estimator.

The function $L$ is called the "likelihood function", not the "maximum likelihood function."

Generally in parametric statistics, when a parameter is called "known," that means your considering a parametrized family of probability distibutions all having the same value of that parameter.

$\endgroup$
1
$\begingroup$

This may actually be fairly straightforward, so I'm going to provide an answer to my own question and people can shout if it's wrong.

We can start by setting out the liklihood function and the log-liklihood for $\beta$ and $\sigma^2$:

$$L(\beta,\sigma^2) = \prod_{i=0}^{N} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left (\frac{-1}{2\sigma^2}(Y_i - \beta X_i)^2\right )$$

$$l(\beta,\sigma^2) = \ln \left ( \left (\frac{1}{\sqrt{2\pi\sigma^2}}\right )^N\right ) + \frac{-1}{2\sigma^2}\sum_{i=0}^N (Y_i - \beta X_i)^2 $$

To find the estimators $\hat\beta$ and $\hat\sigma^2$ we need to maximise the function $l(\beta,\sigma^2)$. We start by evaluating $\frac{\partial l(\beta,\sigma^2)}{\partial \beta}=0$, which gives:

$$ \frac{\partial l(\beta,\sigma^2)}{\partial \beta} =\frac{-1}{2\sigma^2}\sum_{i=0}^N (-2Y_iX_i +\beta^2X_i^2)=0 $$

$$\therefore \sum_{i=0}^N (-2Y_iX_i +2\beta X_i^2)=0 $$

$$\therefore \hat\beta=\dfrac{\sum_{i=0}^N Y_iX_i}{\sum_{i=0}^N X_i^2} $$

Having found the maximum liklihood estimator for $\beta$ we now look to find the maximum liklihood estimator for $\sigma^2$ as follows by using the log-liklihood function:

$$l(\hat\beta,\sigma^2) = \ln \left ( \left (\frac{1}{\sqrt{2\pi\sigma^2}}\right )^N\right ) + \frac{-1}{2\sigma^2}\sum_{i=0}^N (Y_i - \hat\beta X_i)^2 $$

To find the the $\sigma^2$ than maximises this, which we'll denote $\hat\sigma^2$ we solve $\dfrac{\partial l(\hat\beta,\sigma^2)}{\partial \sigma^2} = 0$. To do this we can see that:

$$\dfrac{\partial l(\hat\beta,\sigma^2)}{\partial \sigma^2} = \left ( (2\pi\sigma^2)^{\frac{N}{2}} \right ) \left ( \frac{-N}{2}\right )\left ( (2\pi)^{\frac{-N}{2}} (\sigma^2)^{\frac{-N}{2}-1} \right ) +\frac{1}{2}(\sigma^2)^{-2}\sum_{i=0}^N (Y_i - \hat\beta X_i)^2 =0 $$

$$\therefore\frac{-N}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum_{i=0}^N (Y_i - \hat\beta X_i)^2=0$$

$$\therefore\frac{-N\sigma^2}{2} + \sum_{i=0}^N (Y_i - \hat\beta X_i)^2=0$$

$$\therefore \hat\sigma^2 = \frac{\sum_{i=0}^N (Y_i - \hat\beta X_i)^2}{N}$$

$$\therefore \hat\sigma^2 = \frac{\sum_{i=0}^N \hat\varepsilon_i^2}{N}$$

I'll wait to see if anyone comments on anything incorrect in there before I accept this as the answer.

$\endgroup$
1
  • $\begingroup$ This is correct as far as I can tell in a somewhat hasty viewing, but I think it makes the matter somewhat more complicated and less clear than they could be. $\endgroup$ Commented May 31 at 19:54

Not the answer you're looking for? Browse other questions tagged or ask your own question.