18
$\begingroup$

I just rigorously learned that OLS is a special case of MLE. It surprises me because the popular and "reliable" sources such as researchgate and this do not mention this most important connection between MLE and OLS!

I am not sure if there are any simple regression or estimation method that does not belong to MLE.

$\endgroup$
2
  • 7
    $\begingroup$ From what I've seen Q&A researchgate is not particularly reliable. I've never heard of "differencebetween". Beware what you read on the internet (or for that matter, in some textbooks, though the frequency of good sources is better then) $\endgroup$
    – Glen_b
    Commented Dec 27, 2019 at 0:12
  • 6
    $\begingroup$ ... And yes, I fully recognize my statement included this site. So it should -- you should definitely be a skeptical consumer of information and advice in every instance. But at least StackExchange makes it easier to find and fix errors over time, for example by encouraging ongoing curation of answers, and consolidating questions into canonical threads. It doesn't eliminate problems, but it does improve the average quality noticeably. $\endgroup$
    – Glen_b
    Commented Dec 27, 2019 at 0:51

5 Answers 5

28
$\begingroup$

Least squares is indeed maximum likelihood if the errors are iid normal, but if they aren't iid normal, least squares is not maximum likelihood. For example if my errors were logistic, least squares wouldn't be a terrible idea but it wouldn't be maximum likelihood.

Lots of estimators are not maximum likelihood estimators; while maximum likelihood estimators typically have a number of useful and attractive properties they're not the only game in town (and indeed not even always a great idea).

A few examples of other estimation methods would include

  • method of moments (this involves equating enough sample and population moments to solve for parameter estimates; sometimes this turns out to be maximum likelihood but usually it doesn't)

    For example, equating first and second moments to estimate the parameters of a gamma distribution or a uniform distribution; not maximum likelihood in either case.

  • method of quantiles (equating sufficient sample and population quantiles to solve for parameter estimates; occasionally this is maximum likelihood but usually it isn't),

  • minimizing some other measure of lack of fit than $-\log\mathcal{L}$ (e.g. minimum chi-square, minimum K-S distance).

With fitting linear regression type models, you could for example look at robust regression (some of which do correspond to ML methods for some particular error distribution but many of which do not).

In the case of simple linear regression, I show an example of two methods of fitting lines that are not maximum likelihood here - there estimating slope by setting to 0 some other measure of correlation (i.e. other than the usual Pearson) between residuals and the predictor.

Another example would be the Tukey's resistant line/Tukey's three group line (e.g. see ?line in R). There are many other possibilities, though many of them don't generalize readily to the multiple regression situation.

$\endgroup$
6
  • 1
    $\begingroup$ Thank you very much for this detailed answer! I wonder if all regression method has some sort of likelihood functions, loosely speaking, even if it is not a max likelihood procedure. $\endgroup$
    – High GPA
    Commented Dec 27, 2019 at 1:09
  • 1
    $\begingroup$ I really don't know what you're asking there, sorry. Parameters in otherwise-fully-specified distributions have likelihood functions. $\endgroup$
    – Glen_b
    Commented Dec 27, 2019 at 1:10
  • $\begingroup$ Sorry for being unclear. I would just like to confirm that, even if a regression method (e.g. OLS) is not MLE, we can still calculate the likelihood function from the model we obtained. Is it true? Thank you for helping and I hope this is clearer $\endgroup$
    – High GPA
    Commented Dec 27, 2019 at 1:12
  • 5
    $\begingroup$ If you have a distributional model (e.g. for the conditional distribution in a regression-like situation) and the data, you can calculate a likelihood function, without reference to any specific estimates. You can then use that function calculate a likelihood value for the fitted parameter values you obtained (however you happened to obtain them), just as you can with any specific set of values for the parameters -- but to what end? $\endgroup$
    – Glen_b
    Commented Dec 27, 2019 at 1:43
  • 1
    $\begingroup$ "For example if my errors were logistic, least squares wouldn't be a terrible idea but it wouldn't be maximum likelihood." — Maybe I'm misunderstanding what you mean by logistic error, but if it's what I think you mean, then logistic error is maximum likelihood Bernoulli estimation. Most of the GLMs can be viewed as MLE for some exponential family distribution. $\endgroup$
    – Neil G
    Commented Dec 29, 2019 at 19:30
4
$\begingroup$

All MLE is minimax but not all minimax is MLE. Some examples of minimax estimators that do not maximize a likelihood are ROC regression, conditional logistic regression, Cox proportional hazards models, nearest neighbor, quasilikelihood, the list goes on and on. Hodge's "superefficient" estimator beats maximum likelihood as a more efficient UMVUE (unbiased minimum variance) estimator of the mean in a normal sample but it is NOT minimax

$\endgroup$
3
  • $\begingroup$ I'm not sure I'd say that Cox PH models are not an MLE; the Cox PH solution is the optimal parameter values for the partial likelihood function. $\endgroup$
    – Cliff AB
    Commented Dec 28, 2019 at 19:29
  • $\begingroup$ @CliffAB Stigler's "Epic History of Maximum Likelihood" gives an excellent historical account of the dialogue on this matter. The truth is when we exclude ancillary parameters from estimation (like dispersion in QuasiMLE or for Cox the event times conditional on risk-set order statistics), we cannot be guaranteed of the same optimality or regularity criteria that MLEs have. Asymptotic efficiency of the MLE is based on a Taylor expansion using the information matrix. When we knock out a few dimensions of ancillary parameters weird things happen. $\endgroup$
    – AdamO
    Commented Dec 29, 2019 at 14:02
  • $\begingroup$ Oh interesting, I didn't realize there have been huge discussions about whether to consider the Cox PH model an MLE! $\endgroup$
    – Cliff AB
    Commented Dec 29, 2019 at 16:05
4
$\begingroup$

Bayesian approaches do not involve maximizing a likelihood function, but rather integrating over a posterior distribution. Note that the underlying model may be exactly identical (i.e., linear regression, generalized linear regression), but we also need to provide a prior distribution which captures our uncertainty in the parameters before seeing the data. The posterior distribution is simply the normalized distribution of the prior times the likelihood.

I believe that most statisticians these days generally agree that a Bayesian approach is generally superior to an MLE approach for parameter estimation. However, when one has a lot of data, it may not be so much better that it's both the extra computational costs (integrating is harder than optimizing!) and extra effort of coming up with a prior distribution. In fact, one can show that asymptotically, the MLE + normal approximation approaches the posterior distribution under certain conditions.

$\endgroup$
3
$\begingroup$

$$ Y_i = \alpha + \beta x_i + \varepsilon_i $$

  • $\alpha,\beta$ are non-random and not observable.
  • $\varepsilon_i$ are random and not observable.
  • $x_i$ are non-random and are observable.
  • $Y_i$ are consequently random, and are observable.

Suppose you have the Gauss–Markov assumptions:

  • The errors $\varepsilon_i$ have expected value zero.
  • The errors all have the same (finite) variance but not necessarily the same distribution (in particular, they are not assumed to be normal).
  • The errors are uncorrelated but not necessarily independent.

One cannot do MLE because there's no parametrized family of distributions. But one can still do ordinary least squares.

And among all linear combinations of the $y$-values with non-random observable coefficients, that are unbaised estimators of $\alpha$ and $\beta,$ the least-squares estimators have the smallest variance.

$\endgroup$
2
$\begingroup$

An answer to the question "What regression/estimation is not a MLE?", a simple and robust alternative to Least-Squares (LS) is reportedly Least-Absolute Deviation (LAD).

To quote a source:

"The least absolute deviations method (LAD) is one of the principal alternatives to the least-squares method when one seeks to estimate regression parameters. The goal of the LAD regression is to provide a robust estimator."

Interestingly, per a reference, to quote "The least absolute deviations estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution." Here is a link that discusses some interesting applications of the Laplace (like as a Bayesian prior, and for extreme events).

Historically, the LAD procedure was introduced 50 years before the least-squares method (1757) by Roger Joseph Boscovich, who employed it to reconcile incoherent measures relating to the shape of the earth.

An illustrative difference is in the very simple case of Y = Constant, where the LS returns the sample mean, while the LAD selects the sample median! So in contexts with one or two extreme values, which for whatever reason (like heteroscedasticity), that may arise, LS could display a major shift in the true slope estimate, especially when there is one very low and/or a high observation, as a noted weakness. Wikipedia on robust regression makes a supporting comment:

"In particular, least squares estimates for regression models are highly sensitive to outliers."

With respect to applications, this can be particularly important, for example, in chemistry-based data analysis to predict a so-called reaction's Rate Law (which is based on the slope estimate).

$\endgroup$
1
  • 5
    $\begingroup$ LAD is the MLE for conditional responses with Laplace distributions. $\endgroup$
    – whuber
    Commented Dec 27, 2019 at 18:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.