15
$\begingroup$

If somebody said

"That method uses the MLE the point estimate for the parameter which maximizes $\mathrm{P}(x|\theta)$, therefore it is frequentist; and further it is not Bayesian."

would you agree?

  • Update on the background: I recently read a paper that claims to be frequentist. I don't agree with their claim, at best I feel it's ambiguous. The paper does not explicitly mention either the MLE (or the MAP, for that matter). They just take a point estimate, and they simply proceed as if this point estimate was true. They do not do any analysis of the sampling distribution of this estimator, or anything like that; the model is quite complex and therefore such analysis is probably not possible. They do not use the word 'posterior' at any point either. They just take this point estimate at face value and proceed to their main topic of interest - inferring missing data. I don't think there is anything in their approach which suggests what their philosophy is. They may have intended to be frequentist (because they feel obliged to wear their philosophy on their sleeve), but their actual approach is quite simple/convenient/lazy/ambiguous. I'm inclined now to say that the research doesn't really have any philosophy behind it; instead I think their attitude was more pragmatic or convenient:

    "I have observed data, $x$, and I wish to estimate some missing data, $z$. There is a parameter $\theta$ which controls the relationship between $z$ and $x$. I don't really care about $\theta$ except as a means to an end. If I have an estimate for $\theta$ it will make it easier to predict $z$ from $x$. I will choose a point estimate of $\theta$ because it's convenient, in particular I will choose the $\hat{\theta}$ that maximizes $\mathrm{P}(x|\theta)$."

The idea of an unbiased estimator is clearly a Frequentist concept. This is because it doesn't condition on the data, and it describes a nice property (unbiasedness) which would hold for all values of the parameter.

In Bayesian methods, the roles of the data and the parameter are sort of reversed. In particular, we now condition on the observed data and proceed to make inferences about the value of the parameter. This requires a prior.

So far so good, but where does the MLE (Maximum Likelihood Estimate) fit into all this? I get the impression that many people feel that it is Frequentist (or more precisely, that it is not Bayesian). But I feel that it is Bayesian because it involves taking the observed data and then finding the parameter which maximizes $P(data | parameter)$. The MLE is implicitly using a uniform prior and conditioning on the data and maximizing $P(parameter | data)$. Is it fair to say that the MLE looks both Frequentist and Bayesian? Or does every simple tool have to fall into exactly one of those two categories?

The MLE is consistent but I feel that consistency can be presented as a Bayesian idea. Given arbitrarily large samples, the estimate converges on the correct answer. The statement "the estimate will be equal to the true value" holds true for all values of the parameter. The interesting thing is that this statement also holds true if you condition on the observed data, making it Bayesian. This interesting aside holds for the MLE, but not for an unbiased estimator.

This is why I feel that the MLE is the 'most Bayesian' of the methods that might be described as Frequentist.

Anyway, most Frequentist properties (such as unbiasedness) apply in all cases, including finite sample sizes. The fact that consistency only holds in the impossible scenario (infinite sample within one experiment) suggests that consistency isn't such a useful property.

Given a realistic (i.e. finite) sample, is there a Frequentist property that holds true of the MLE? If not, the MLE isn't really Frequentist.

$\endgroup$
5
  • 7
    $\begingroup$ The MLE cannot be considered Bayesian starting from the interpretation of parameters in both paradigms. From a Bayesian perspective, a parameter is a random variable while in the classical setting is a value to be estimated. The MLE coincides with the MAP (and possibly other point Bayesian estimators) in many cases but the interpretation is completly different. $\endgroup$
    – user10525
    Commented Jun 26, 2012 at 11:45
  • 1
    $\begingroup$ I do not understand this question. (I may be alone in this.) Exactly what do you mean by "frequentist"? "Not Bayesian" won't do, because that comprises a huge range of philosophies and methods. What makes something a "frequentist property"? Is there any connection at all between your "frequentist" and, say, an Abraham Wald or Jack Kiefer who justifies statistical procedures with decision theoretic principles? (Kiefer, in particular, had a rather critical opinion of MLE on this basis.) $\endgroup$
    – whuber
    Commented Jun 28, 2012 at 14:52
  • 3
    $\begingroup$ @whuber: You are not alone. The one vote to close is mine and was made a day or two ago. This question lacks some clarity and focus and borders on not constructive due to its discursive and somewhat-polemic framing, in my view. $\endgroup$
    – cardinal
    Commented Jun 28, 2012 at 16:47
  • $\begingroup$ Flagged, @Macro. I admitted my question is rubbish and I claimed that the discussions weren't always great either (and that I'm not saint!). I suppose my underlying question is: "If somebody claims to be 'a frequentist', do we take that at face value or do we judge that by the methods they actually use?" I now feel that is a vague and boring question, and isn't really suitable for this site. $\endgroup$ Commented Jun 28, 2012 at 17:53
  • 1
    $\begingroup$ The moderators are reluctant to close this thread because it has collected many replies (including one that had been accepted!) and comments, which suggests the community may disagree with your new assessment of this thread, Aaron. $\endgroup$
    – whuber
    Commented Jun 29, 2012 at 15:53

6 Answers 6

11
$\begingroup$

Or does every simple tool have to fall into exactly one of those two categories?

No. Simple (and not so simple tools) can be studied from many different viewpoints. The likelihood function by itself is a cornerstone in both Bayesian and frequentist statistics, and can be studied from both points of view! If you want, you can study the MLE as an approximate Bayes solution, or you can study its properties with asymptotic theory, in a frequentist way.

$\endgroup$
11
  • 1
    $\begingroup$ I like this answer so far. A senior colleague recently reminded me that "many people in common language talk about frequentist and Bayesian. I think a more valid distinction is likelihood-based and frequentist. Both maximum likelihood and Bayesian methods adhere to the likelihood principle whereas frequentist methods don't." $\endgroup$ Commented Jun 26, 2012 at 18:46
  • 4
    $\begingroup$ That is wrong Aaron. Frequentists do use maximum likelihood estimation and believe in the likelihood principle. Kjetil is right that the likelihood function is a key element of both the Bayesian and frequentist approaches to inference. But they use it differently. $\endgroup$ Commented Jun 26, 2012 at 19:04
  • 4
    $\begingroup$ I have given a very good answer to Aaron's question but for some strange reason people are downvoting it. They must not understand what's going on. There is no way that maximum likelihood estimation can be classified as Bayesian since it maximizes the likelihood and does not consider prior distributions at all! $\endgroup$ Commented Jun 26, 2012 at 19:07
  • 7
    $\begingroup$ @Michael, have you ever witnessed a productive back and forth that begins with "why was I downvoted"? I sure haven't. That's why I (and several other members here) discourage even starting the conversation, regardless of whether or not you think it's justified. It's pointless and generally leads to extended off-topic discussion. $\endgroup$
    – Macro
    Commented Jun 27, 2012 at 15:05
  • 3
    $\begingroup$ @Michael In your email interactions on the ASA listservers you have stood out in that respect: you are really good at helping the OP understand and clarify their question (and often will devote several messages to that effort, showing patience) and then you provide great replies. I have always thought you would do very well here on SE by following the same model: use comments to the question to conduct that initial interaction, then--once you have discerned the real nature of the question--provide a great reply. $\endgroup$
    – whuber
    Commented Jun 28, 2012 at 14:41
10
$\begingroup$

When you're doing Maximum Likelihood Estimation you consider the value of the estimate and the sampling properties of the estimator in order to establish the uncertainty of your estimate expressed as a confidence interval. I think this is important regarding your question because a confidence interval will in general depend on sample points that were not observed, which is seem by some as an essentially unbayesian property.

P.S. This is related to the more general fact that Maximum Likelihood Estimation (Point + Interval) fails to satisfy the Likelihood Principle, while a full ("Savage style") Bayesian analysis does.

$\endgroup$
3
  • $\begingroup$ +1. The idea that the truncated normal will result in a different posterior is interesting and surprising! I did comment that I was skeptical, but I deleted that comment. I'll need to think of this a little more. Normally, I find the Likelihood Principle to be 'obviously true', so I should think about this a bit more. $\endgroup$ Commented Jun 27, 2012 at 9:54
  • $\begingroup$ Good point Zen. I guess as a point estimate maximum likelihood estimation is in adherence to the likelihood principle but the frequentist notion of confidence intervals is not. $\endgroup$ Commented Jun 27, 2012 at 15:08
  • $\begingroup$ @Zen, I am not convinced that the posteriors are the same. Do you have a reference for that? I have created a Google Doc with my argument that the posterior will change as we replace a normal with a truncated normal. Thanks in advance. $\endgroup$ Commented Jun 27, 2012 at 22:43
7
$\begingroup$

The likelihood function is a function that involves the data and the unknown parameter(s). It can be viewed as the probability density for the observed data given the value(s) of the parameter(s). The parameters are fixed. So by itself the likelihood is a frequentist notion. Maximizing the likelihood is just to find the specific value(s) of the parameter(s) that makes the likelihood take on its maximum value. So maximum likelihood estimation is a frequentist method based solely on the data and the form of the model that is assumed to generate it. Bayesian estimation only enters in when a prior distribution is placed on the parameter(s) and Bayes' formula is used to obtain an aposteriori distribution for the parameter(s) by combining the prior with the likelihood.

$\endgroup$
1
  • $\begingroup$ All comments posted here have been moved to a dedicated chat room. If someone has difficulty to join this room, and in this case only, please flag for moderator attention. No further comments will be accepted. $\endgroup$
    – chl
    Commented Jun 28, 2012 at 17:18
6
$\begingroup$

Assuming that by "Bayesian" you refer to subjective Bayes (a.k.a. epistemic Bayes, De-Finetti Bayes) and not the current empirical Bayes meaning-- it is far from trivial. On the one hand, you infer based on your data alone. There are no subjective beliefs at hand. This seems frequentist enough... But the critique, expressed even at Fisher himself (a strict non (subjective) Bayesian), is that in the choice of the sampling distribution of the data subjectivity has crawled in. A parameter is only defined given our beliefs of the data generating process.

In conclusion-- I believe the MLE is typically considered a frequentist concept, albeit it is a mere matter of how you define "frequentist" and "Bayesian".

$\endgroup$
1
  • $\begingroup$ +1: This is what I was trying to get at in my comment above. $\endgroup$
    – Neil G
    Commented Jun 26, 2012 at 22:45
2
$\begingroup$

The point estimator that maximises $P(x|\theta)$ is the MLE. This is a commonly used point estimator in frequentist statistics, but it is less commonly used in Bayesian statistics. In Bayesian statistics it is usual to use a point estimator which is either the posterior expected value, or the value minimising the expected-loss (risk) in a decision problem. There are certainly some cases where the Bayesian estimator will correspond with the MLE (e.g., if we have a uniform prior, or in some special cases of minimising loss), but this is not a common occurrence. Hence, as a general rule, the MLE is usually a frequentist estimator.

$\endgroup$
1
$\begingroup$

(answering own question)

An estimator is a function that takes some data and produces a number (or range of numbers). An estimator, by itself, isn't really 'Bayesian' or 'frequentist' - you can think of it as a black box where numbers go in and numbers come out. You can present the same estimator to a frequentist and to a Bayesian and they will have different things to say about the estimator.

(I'm not happy with my simplistic distinction between frequentist and Bayesian - there are other issues to consider. But for simplicity, let's pretend that are just two well-defined philosophical camps.)

You cannot tell whether a researcher is frequentist of Bayesian just by which estimator they choose. The important thing is to listen to what analyses they do on the estimator and what reasons they give for choosing that estimator.

Imagine you create a piece of software that finds that value of $\theta$ which maximizes $\mathrm{P}(\mathbf{x}|\theta)$. You present this software to a frequentist and ask them to make a presentation about it. They will probably proceed by analyzing the sampling distribution and testing whether the estimator is biased. And maybe they'll check if it is consistent. They will either approve of, or disapprove of, the estimator based on properties such as this. These are the types of properties that a frequentist is interested in.

When the same software is presented to a Bayesian, the Bayesian might well be happy with much of the frequentist's analysis. Yes, all other things being equal, bias isn't good and consistency is good. But the Bayesian will be more interested in other things. The Bayesian will want to see if the estimator takes the shape of some function of posterior distribution; and if so, what prior was used? If the estimator is based on a posterior, the Bayesian will wonder whether the prior is good one. If they are happy with the prior, and if the estimator is reporting the mode of the posterior (as opposed to, say, the mean of the posterior) then they are happy to apply this interpretation to the estimate: "This estimate is the point estimate which has the best chance of being correct."

I often hear is said that frequentists and Bayesian "interpret" things differently, even when the numbers involved are the same. This can be a little confusing, and I don't think it's really true. Their interpretations don't conflict with each other; they simply make statements about different aspects of the system. Let's put aside point estimates for the moment and consider intervals instead. In particular, there are frequentist confidence intervals and Bayesian credible intervals. They will usually give different answers. But in certain models, with certain priors, the two types of interval will give the same numerical answer.

When the intervals are the same, how can we interpret them differently? A frequentist will say of an interval estimator:

Before I see the data or the corresponding interval, I can say there is at least a 95% probability that the true parameter will be contained within the interval.

whereas a Bayesian will say of an interval estimator:

After I see the data or the corresponding interval, I can say there is at least a 95% probability that the true parameter is contained within the interval.

These two statements are identical, apart from the words 'Before' and 'After'. The Bayesian will understand and agree with the former statement and also will acknowledge that its truth is independent of any prior, thereby making it 'stronger'. But speaking as a Bayesian myself, I would worry that the former statement mightn't be very useful. The frequentist won't like the latter statement, but I don't understand it well enough to give a fair description of the frequentist's objections.

After seeing the data, will the frequentist still be optimistic that the true value is contained within the interval? Maybe not. This is a bit counterintuitive but it is important for truly understanding confidence intervals and other concepts based on the sampling distribution. You might presume that the frequentist would still say "Given the data, I still think there is a 95% probability that the true value is in this interval". A frequentist would not only question whether that statement is true, they would also question whether it is meaningful to attribute probabilities in this way. If you have more questions on this, don't ask me, this issue is too much for me!

The Bayesian is happy to make that statement: "Conditioning on the data I have just seen, the probability is 95% that the true value is in this range."

I must admit I'm a little confused on one final point. I understand, and agree with, the statement made by the frequentist before the data is seen. I understand, and agree with, with the statement made by the Bayesian after the data is seen. However, I'm not so sure what the frequentist will say after the data is seen; will their beliefs about the world have changed? I'm not in a position to understand the frequentist philosophy here.

$\endgroup$
4
  • 1
    $\begingroup$ Although I find much of this clear and thought-providing, it seems wholly to ignore something fundamental, which is different interpretations of probability altogether. Also, the last two paragraphs do not apply to any analysis or interpretation I have seen. Indeed, I don't recognize any practicing statistician in your "frequentist" (who sounds rather like an ancient philosopher). Who--at least after Aristotle--has ever said that their data analysis is complete before the data have been obtained? Is this a straw man for trying to advance a Bayesian approach? $\endgroup$
    – whuber
    Commented Jun 28, 2012 at 14:48
  • 1
    $\begingroup$ @whuber, if it is a straw man, it is not intentional. It's always difficult to make any attempt to report on others' opinions without accidentally including a judgement on it. And I don't claim to have a broad understanding of the many nuanced positions. I'll try to rethink my final paragraph. Also, you say I left out "different interpretations of probability altogether". I'd rather say nothing than say something incorrect. It's not possible to say everything. I can try to give you the truth and nothing but the truth, but I can't give you the whole truth :-) $\endgroup$ Commented Jun 28, 2012 at 16:17
  • $\begingroup$ (+1) You're right, there's a long debate here and one can't cover every point in one post. I'm upvoting this reply for its careful and thoughtful exposition (but not because I agree with all of it!). $\endgroup$
    – whuber
    Commented Jun 28, 2012 at 16:33
  • $\begingroup$ I've edited the last few paragraphs to try to be fairer; from "After seeing the data..." onwards. I'm no expert, so I'm trying to be honestly vague where I'm getting out of my depth. Thanks for the feedback. $\endgroup$ Commented Jun 28, 2012 at 16:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.