436
$\begingroup$

How would you describe in plain English the characteristics that distinguish Bayesian from Frequentist reasoning?

$\endgroup$
3
  • 1
    $\begingroup$ This question about drawing inferences about an individual bowl player when you have two data sets - other players' results, and the new player's results, is a good spontaneous example of the difference which my answer tries to address in plain English. $\endgroup$ Commented Jan 24, 2012 at 10:01
  • 5
    $\begingroup$ Perhaps some of you good folks could also contribute an answer to a question about Bayesian and frequentist interpretations that is asked over at philosophy.stackexchange.com. $\endgroup$
    – Drux
    Commented Oct 1, 2013 at 3:09
  • $\begingroup$ I have provided an answer to this question on another thread, Bayesian vs frequentist interpretations of probability. $\endgroup$ Commented Dec 31, 2020 at 19:05

14 Answers 14

264
$\begingroup$

Here is how I would explain the basic difference to my grandma:

I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.

Problem: Which area of my home should I search?

Frequentist Reasoning

I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.

Bayesian Reasoning

I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.

$\endgroup$
11
  • 17
    $\begingroup$ I like the analogy. I would find it very useful if there were a defined question (based on a dataset) in which an answer was derived using frequentist reasoning and an answer was derived using Bayesian - preferably with R script to handle both reasonings. Am I asking too much? $\endgroup$
    – Farrel
    Commented Jul 19, 2010 at 19:56
  • 17
    $\begingroup$ The simplest thing that I can think of that tossing a coin n times and estimating the probability of a heads (denote by p). Suppose, we observe k heads. Then the probability of getting k heads is: P (k heads in n trials) = (n, k) p^k (1-p)^(n-k) Frequentist inference would maximize the above to arrive at an estimate of p = k / n. Bayesian would say: Hey, I know that p ~ Beta(1,1) (which is equivalent to assuming that p is uniform on [0,1]). So, the updated inference would be: p ~ Beta(1+k,1+n-k) and thus the bayesian estimate of p would be p = 1+k / (2+n) I do not know R, sorry. $\endgroup$
    – user28
    Commented Jul 19, 2010 at 20:11
  • 52
    $\begingroup$ It should be pointed out that, from the frequentists point of view, there is no reason that you can't incorporate the prior knowledge into the model. In this sense, the frequentist view is simpler, you only have a model and some data. There is no need to separate the prior information from the model. $\endgroup$ Commented Sep 9, 2010 at 22:29
  • 3
    $\begingroup$ @BYS2 The programming language called R. $\endgroup$ Commented May 11, 2014 at 6:40
  • 4
    $\begingroup$ As was commented already in 2010, from the frequentists point of view, there is no reason that you can't incorporate the prior knowledge into the model. Here an example of explicitly using informative priors in ferquentist reasoning: Using prior knowledge in frequentist tests. figshare. doi.org/10.6084/m9.figshare.4819597.v3 See also alternative definitions in other answers below. $\endgroup$
    – user36160
    Commented May 27, 2017 at 14:25
145
$\begingroup$

Tongue firmly in cheek:

A Bayesian defines a "probability" in exactly the same way that most non-statisticians do - namely an indication of the plausibility of a proposition or a situation. If you ask them a question about a particular proposition or situation, they will give you a direct answer assigning probabilities describing the plausibilities of the possible outcomes for the particular situation (and state their prior assumptions).

A Frequentist is someone that believes probabilities represent long run frequencies with which events occur; if needs be, they will invent a fictitious population from which your particular situation could be considered a random sample so that they can meaningfully talk about long run frequencies. If you ask them a question about a particular situation, they will not give a direct answer, but instead make a statement about this (possibly imaginary) population. Many non-frequentist statisticians will be easily confused by the answer and interpret it as Bayesian probability about the particular situation.

However, it is important to note that most Frequentist methods have a Bayesian equivalent that in most circumstances will give essentially the same result, the difference is largely a matter of philosophy, and in practice it is a matter of "horses for courses".

As you may have guessed, I am a Bayesian and an engineer. ;o)

$\endgroup$
9
  • 49
    $\begingroup$ As a non-expert, I think that the key to the entire debate is that people actually reason like Bayesians. You have to be trained to think like a frequentist, and even then it's easy to slip up and either reason or present your reasoning as if it were Bayesian. "There's a 95% chance that the value is within this confidence interval." Enough said. $\endgroup$
    – Wayne
    Commented Apr 7, 2011 at 21:08
  • 10
    $\begingroup$ The key also is to think about what kind of lobbying has the statistics of the 20th century be called "classical" while the statistics that Laplace and Gauss have started to use in the 19th century are not... $\endgroup$
    – gwr
    Commented Nov 16, 2015 at 15:53
  • 7
    $\begingroup$ Maybe I've been doing frequentist work too long, but I'm not so sure the Bayesian viewpoint is always intuitive. For example, suppose I am interested in a real world parameter of interest, such as average height of a population. If I tell you "there is a 95% chance the parameter of interest in the my credible interval", and then follow up with a question of "If we created 100 such intervals for different parameters, what proportion of them would we expect to contain the real values of the parameter?", the fact that the answer is not 95 must be confusing to some people. $\endgroup$
    – Cliff AB
    Commented Aug 3, 2016 at 17:53
  • 4
    $\begingroup$ @CliffAB but why would you ask the second question? The point is they are different questions, so it is unsurprising that they have different answers. The Baysian can answer both questions, but the answer may be different (which seems reasonable to me). The frequentist can only answer one of the questions (due to the restrictive definition of probability) and hence (implicitly) uses the same answer for both questions, which is what causes the problems. A credible interval is not a confidence interval, but a Bayesian can construct both a credible interval and a confidence interval. $\endgroup$ Commented Aug 4, 2016 at 6:40
  • 5
    $\begingroup$ My comment was in response to Wayne's; the idea that people "naturally" think in a Bayesian context, as it's easier to interpret a credible interval. My point is that while it's simpler to construct the right interpretation of a credible interval (i.e. less of a word soup), I think the non-statistician is just as likely to be confused about what that really means. $\endgroup$
    – Cliff AB
    Commented Aug 4, 2016 at 13:15
88
$\begingroup$

Very crudely I would say that:

Frequentist: Sampling is infinite and decision rules can be sharp. Data are a repeatable random sample - there is a frequency. Underlying parameters are fixed i.e. they remain constant during this repeatable sampling process.

Bayesian: Unknown quantities are treated probabilistically and the state of the world can always be updated. Data are observed from the realised sample. Parameters are unknown and described probabilistically. It is the data which are fixed.

There is a brilliant blog post which gives an indepth example of how a Bayesian and Frequentist would tackle the same problem. Why not answer the problem for yourself and then check?

The problem (taken from Panos Ipeirotis' blog):

You have a coin that when flipped ends up head with probability $p$ and ends up tail with probability $1-p$. (The value of $p$ is unknown.)

Trying to estimate $p$, you flip the coin 100 times. It ends up head 71 times.

Then you have to decide on the following event: "In the next two tosses we will get two heads in a row."

Would you bet that the event will happen or that it will not happen?

$\endgroup$
7
  • 9
    $\begingroup$ Since $0.71^2=0.5041$, I would regard this as close enough to an even bet to be prepared to go modestly either way just for fun (and to ignore any issues over the shape of the prior). I sometimes buy insurance and lottery tickets with far worse odds. $\endgroup$
    – Henry
    Commented Oct 4, 2011 at 13:35
  • 8
    $\begingroup$ At the end of that blog post it says "instead of using the uniform distribution as a prior, we can be even more agnostic. In this case, we can use the Beta(0,0) distribution as a prior. Such a distribution corresponds to the case where any mean of the distribution is equally likely. In this case, the two approaches, Bayesian and frequentist give the same results." which kind of sums it up really! $\endgroup$
    – tdc
    Commented Feb 8, 2012 at 8:39
  • 16
    $\begingroup$ The big problem with that blog post is it does not adequately characterize what a non-Bayesian (but rational) decision maker would do. It's little more than a straw man. $\endgroup$
    – whuber
    Commented May 4, 2012 at 22:36
  • 1
    $\begingroup$ @tdc: the Bayesian (Jeffreys) prior is Beta(0.5, 0.5) and some would say that it is the only justifiable prior. $\endgroup$
    – Neil G
    Commented Aug 3, 2012 at 18:59
  • 3
    $\begingroup$ Beta(.5, .5) doesn't look at all like an appropriate prior for the p of a coin. (I guess that's supposed to be an "uninformative" prior but it's not uninformative in the everyday sense, rather it's "opinionated" that p is near 0 or 1, an opinion that seems wrong.) $\endgroup$
    – Qwertie
    Commented Sep 8, 2018 at 6:29
53
$\begingroup$

Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.

Then informally:

The Frequentist would say that each outcome has an equal 1 in 6 chance of occurring. She views probability as being derived from long run frequency distributions.

The Bayesian however would say hang on a second, I know that man, he's David Blaine, a famous trickster! I have a feeling he's up to something. I'm going to say that there's only a 1% chance of it landing on a 3 BUT I'll re-evaluate that beliefe and change it the more times he rolls the die. If I see the other numbers come up equally often, then I'll iteratively increase the chance from 1% to something slightly higher, otherwise I'll reduce it even further. She views probability as degrees of belief in a proposition.

$\endgroup$
3
  • 29
    $\begingroup$ I think the frequentist would (verbosely) point out his assumptions and would avoid making any useful prediction. Maybe he'd say, "Assuming the die is fair, each outcome has an equal 1 in 6 chance of occurring. Furthermore, if the die rolls are fair and David Blaine rolls the die 17 times, there is only a 5% chance that it will never land on 3, so such an outcome would make me doubt that the die is fair." $\endgroup$ Commented Jun 16, 2011 at 3:41
  • 1
    $\begingroup$ So would "likelihood" (as in MLE) be the frequentist's "probability"? $\endgroup$
    – Akababa
    Commented Nov 30, 2017 at 15:32
  • 4
    $\begingroup$ Couldn't the frequentist use a hypothetical David Blaine dice model and not necessarily a uniform fair dice model? $\endgroup$
    – qwr
    Commented Aug 28, 2020 at 16:13
51
$\begingroup$

Just a little bit of fun...

A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.

From this site:

http://www2.isye.gatech.edu/~brani/isyebayes/jokes.html

and from the same site, a nice essay...

"An Intuitive Explanation of Bayes' Theorem"

http://yudkowsky.net/rational/bayes

$\endgroup$
2
  • 20
    $\begingroup$ In which case, the wouldn't the frequentist be one who knows the ratio of donkey, mule and horse populations, and upon observing a pack of mules starts to calculate the p-value to know as to whether there has been a statistically significant increase in the population ratio of mules. $\endgroup$
    – Andrew
    Commented Apr 20, 2012 at 6:22
  • 8
    $\begingroup$ This doesn't answer the question at all. $\endgroup$
    – qwr
    Commented Aug 28, 2020 at 16:14
32
$\begingroup$

In plain english, I would say that Bayesian and Frequentist reasoning are distinguished by two different ways of answering the question:

What is probability?

Most differences will essentially boil down to how each answers this question, for it basically defines the domain of valid applications of the theory. Now you can't really give either answer in terms of "plain english", without further generating more questions. For me the answer is (as you could probably guess)

probability is logic

my "non-plain english" reason for this is that the calculus of propositions is a special case of the calculus of probabilities, if we represent truth by $1$ and falsehood by $0$. Additionally, the calculus of probabilities can be derived from the calculus of propositions. This conforms with the "bayesian" reasoning most closely - although it also extends the bayesian reasoning in applications by providing principles to assign probabilities, in addition to principles to manipulate them. Of course, this leads to the follow up question "what is logic?" for me, the closest thing I could give as an answer to this question is "logic is the common sense judgements of a rational person, with a given set of assumptions" (what is a rational person? etc. etc.). Logic has all the same features that Bayesian reasoning has. For example, logic does not tell you what to assume or what is "absolutely true". It only tells you how the truth of one proposition is related to the truth of another one. You always have to supply a logical system with "axioms" for it to get started on the conclusions. They also has the same limitations in that you can get arbitrary results from contradictory axioms. But "axioms" are nothing but prior probabilities which have been set to $1$. For me, to reject Bayesian reasoning is to reject logic. For if you accept logic, then because Bayesian reasoning "logically flows from logic" (how's that for plain english :P ), you must also accept Bayesian reasoning.

For the frequentist reasoning, we have the answer:

probability is frequency

although I'm not sure "frequency" is a plain english term in the way it is used here - perhaps "proportion" is a better word. I wanted to add into the frequentist answer that the probability of an event is thought to be a real, measurable (observable?) quantity, which exists independently of the person/object who is calculating it. But I couldn't do this in a "plain english" way.

So perhaps a "plain english" version of one the difference could be that frequentist reasoning is an attempt at reasoning from "absolute" probabilities, whereas bayesian reasoning is an attempt at reasoning from "relative" probabilities.

Another difference is that frequentist foundations are more vague in how you translate the real world problem into the abstract mathematics of the theory. A good example is the use of "random variables" in the theory - they have a precise definition in the abstract world of mathematics, but there is no unambiguous procedure one can use to decide if some observed quantity is or isn't a "random variable".

The bayesian way of reasoning, the notion of a "random variable" is not necessary. A probability distribution is assigned to a quantity because it is unknown - which means that it cannot be deduced logically from the information we have. This provides at once a simple connection between the observable quantity and the theory - as "being unknown" is unambiguous.

You can also see in the above example a further difference in these two ways of thinking - "random" vs "unknown". "randomness" is phrased in such a way that the "randomness" seems like it is a property of the actual quantity. Conversely, "being unknown" depends on which person you are asking about that quantity - hence it is a property of the statistician doing the analysis. This gives rise to the "objective" versus "subjective" adjectives often attached to each theory. It is easy to show that "randomness" cannot be a property of some standard examples, by simply asking two frequentists who are given different information about the same quantity to decide if its "random". One is the usual Bernoulli Urn: frequentist 1 is blindfolded while drawing, whereas frequentist 2 is standing over the urn, watching frequentist 1 draw the balls from the urn. If the declaration of "randomness" is a property of the balls in the urn, then it cannot depend on the different knowledge of frequentist 1 and 2 - and hence the two frequentist should give the same declaration of "random" or "not random".

$\endgroup$
8
  • 3
    $\begingroup$ I'd be interested if you could rewrite this without the reference to common sense. $\endgroup$ Commented Jan 24, 2012 at 10:03
  • $\begingroup$ @PeterEllis - What's wrong with common sense? We all have it, and it is usually foolish not to use it... $\endgroup$ Commented Jan 24, 2012 at 12:15
  • 15
    $\begingroup$ It's too contested what it actually is, and too culturally specific. "Common sense" is short hand for whatever is the perceived sensible way of doing things in this particular culture (which all too often looks far from sensible to another culture in time and space), so referring to it in a definition ducks the key questions. It's particularly unhelpful as part of a definition of logic (and so, I would argue, is the concept of a "rational person" in that particular context - particularly as I am guessing your definition of a "rational person" would be a logical person who has common sense!) $\endgroup$ Commented Jan 24, 2012 at 18:04
  • 5
    $\begingroup$ He can't provide one, his argument is that there is no universal definition, only culturally-specific ones. Two people from different cultural backgrounds (and that includes different styles of statistical education) will quite possibly have two different understandings of what is sensible to do in a given situations. $\endgroup$
    – naught101
    Commented Feb 21, 2012 at 3:05
  • 4
    $\begingroup$ This answer has nuggets of goodness (how's that for plain English?), but I don't believe (how's that for being a Bayesian!) that the following statement is true: "For if you accept logic... you must also accept Bayesian reasoning". For instance, if you think instead of translating the abstract theory of the mathematics into the real world, you'll find that the axiomatic approach can be consistent with both Frequentist and Bayesian reasoning! Arguably, Kolmogorov in the first case, and, say, Jeffreys in the second. In essence, it's the theory of probability that's logic; not its interpretation. $\endgroup$ Commented Jan 17, 2016 at 2:53
30
$\begingroup$

The Bayesian is asked to make bets, which may include anything from which fly will crawl up a wall faster to which medicine will save most lives, or which prisoners should go to jail. He has a big box with a handle. He knows that if he puts absolutely everything he knows into the box, including his personal opinion, and turns the handle, it will make the best possible decision for him.

The frequentist is asked to write reports. He has a big black book of rules. If the situation he is asked to make a report on is covered by his rulebook, he can follow the rules and write a report so carefully worded that it is wrong, at worst, one time in 100 (or one time in 20, or one time in whatever the specification for his report says).

The frequentist knows (because he has written reports on it) that the Bayesian sometimes makes bets that, in the worst case, when his personal opinion is wrong, could turn out badly. The frequentist also knows (for the same reason) that if he bets against the Bayesian every time he differs from him, then, over the long run, he will lose.

$\endgroup$
4
  • $\begingroup$ "over the long run, he will lose" is ambiguous. I assume 'he' is the bayesian here? Wouldn't they equal out over the long long run - the bayesian could learn and change his personal opnion until it matches the actual (but unknown) facts. $\endgroup$
    – lucidbrot
    Commented Feb 19, 2018 at 16:41
  • 1
    $\begingroup$ What is the fundamental difference between a big box and a big rulebook? I cannot understand the analogy. $\endgroup$
    – qwr
    Commented Aug 28, 2020 at 16:18
  • $\begingroup$ I need a "plain english" explanation of WHY if you bet against the Bayesian every time he differs from the frequentist that you would LOSE. I will send $50 via paypal to the preferred charity of the first person to send an explanation that I can understand. $\endgroup$
    – Pseudoego
    Commented May 22, 2021 at 12:31
  • $\begingroup$ I think there are quite a few false dichotomies here - bets / reports as a metaphor for predictions / inferences. Bayesians and frequentists do both. That frequentist inference is truly agnostic is another fallacy. That frequentists control type 1 error and Bayes estimates are optimal for MSE broaden the picture so much, I can't imagine how you can describe one as "winning" or "losing" - they are merely two different tools for solving mostly the same problems, and the answers are often the same in well done designs. $\endgroup$
    – AdamO
    Commented Jul 19, 2023 at 15:36
28
$\begingroup$

In reality, I think much of the philosophy surrounding the issue is just grandstanding. That's not to dismiss the debate, but it is a word of caution. Sometimes, practical matters take priority - I'll give an example below.

Also, you could just as easily argue that there are more than two approaches:

  • Neyman-Pearson ('frequentist')
  • Likelihood-based approaches
  • Fully Bayesian

A senior colleague recently reminded me that "many people in common language talk about frequentist and Bayesian. I think a more valid distinction is likelihood-based and frequentist. Both maximum likelihood and Bayesian methods adhere to the likelihood principle whereas frequentist methods don't."

I'll start off with a very simple practical example:

We have a patient. The patient is either healthy(H) or sick(S). We will perform a test on the patient, and the result will either be Positive(+) or Negative(-). If the patient is sick, they will always get a Positive result. We'll call this the correct(C) result and say that $$ P(+ | S ) = 1 $$ or $$ P(Correct | S) = 1 $$ If the patient is healthy, the test will be negative 95% of the time, but there will be some false positives. $$ P(- | H) = 0.95 $$ $$ P(+ | H) = 0.05 $$ In other works, the probability of the test being Correct, for Healthy people, is 95%.

So, the test is either 100% accurate or 95% accurate, depending on whether the patient is healthy or sick. Taken together, this means the test is at least 95% accurate.

So far so good. Those are the statements that would be make by a frequentist. Those statements are quite simple to understand and are true. There's no need to waffle about a 'frequentist interpretation'.

But, things get interesting when you try to turn things around. Given the test result, what can you learn about the health of the patient? Given a negative test result, the patient is obviously healthy, as there are no false negatives.

But we must also consider the case where the test is positive. Was the test positive because the patient was actually sick, or was it a false positive? This is where the frequentist and Bayesian diverge. Everybody will agree that this cannot be answered at the moment. The frequentist will refuse to answer. The Bayesian will be prepared to give you an answer, but you'll have to give the Bayesian a prior first - i.e. tell it what proportion of the patients are sick.

To recap, the following statements are true:

  • For healthy patients, the test is very accurate.
  • For sick patients, the test is very accurate.

If you are satisfied with statements such as that, then you are using frequentist interpretations. This might change from project to project, depending on what sort of problems you're looking at.

But you might want to make different statements and answer the following question:

  • For those patients that got a positive test result, how accurate is the test?

This requires a prior and a Bayesian approach. Note also that this is the only question of interest to the doctor. The doctor will say "I know that the patients will either get a positive result or a negative result. I also now that the negative result means the patient is healthy and can be send home. The only patients that interest me now are those that got a positive result -- are they sick?."

To summarize: In examples such as this, the Bayesian will agree with everything said by the frequentist. But the Bayesian will argue that the frequentist's statements, while true, are not very useful; and will argue that the useful questions can only be answered with a prior.

A frequentist will consider each possible value of the parameter (H or S) in turn and ask "if the parameter is equal to this value, what is the probability of my test being correct?"

A Bayesian will instead consider each possible observed value (+ or -) in turn and ask "If I imagine I have just observed that value, what does that tell me about the conditional probability of H-versus-S?"

$\endgroup$
4
  • 1
    $\begingroup$ Do you mean For sick patients, the test is NOT very accurate. you forget the NOT? $\endgroup$
    – agstudy
    Commented Jan 6, 2014 at 23:44
  • 1
    $\begingroup$ It's very accurate in both cases, so no I did not forget a word. For healthy people, the result will be correct (i.e. 'Negative') 95% of the time. And for sick people, the result will be correct (i.e. 'Positive') 95% of the time. $\endgroup$ Commented Jan 7, 2014 at 20:56
  • $\begingroup$ I think the "weakness" in maximum likelihood is that it assumes a uniform prior on the data whereas "full Bayesian" is more flexible in what prior you can choose. $\endgroup$
    – Joe Z.
    Commented Jul 12, 2016 at 23:03
  • $\begingroup$ To complete the example, suppose 0.1% of the population is sick with disease D that we're testing for: this is not our prior. More likely, something like 30% of patients who come to the doctor and have symptoms matching D actually have D (this could be more or less depending on details such as how often a different sickness presents with the same symptoms). So 70% of those taking the test are healthy, 66.5% get a negative result, and 30%/33.5% are sick. So given a positive result, our posterior probability that a patient is sick is 89.6%. Next puzzle: how did we know 70% of test-takers have D? $\endgroup$
    – Qwertie
    Commented Sep 8, 2018 at 7:01
11
$\begingroup$

Bayesian and frequentist statistics are compatible in that they can be understood as two limiting cases of assessing the probability of future events based on past events and an assumed model, if one admits that in the limit of a very large number of observations, no uncertainty about the system remains, and that in this sense a very large number of observations is equal to knowing the parameters of the model.

Assume we have made some observations, e.g., outcome of 10 coin flips. In Bayesian statistics, you start from what you have observed and then you assess the probability of future observations or model parameters. In frequentist statistics, you start from an idea (hypothesis) of what is true by assuming scenarios of a large number of observations that have been made, e.g., coin is unbiased and gives 50% heads up, if you throw it many many times. Based on these scenarios of a large number of observations (=hypothesis), you assess the frequency of making observations like the one you did, i.e.,frequency of different outcomes of 10 coin flips. It is only then that you take your actual outcome, compare it to the frequency of possible outcomes, and decide whether the outcome belongs to those that are expected to occur with high frequency. If this is the case you conclude that the observation made does not contradict your scenarios (=hypothesis). Otherwise, you conclude that the observation made is incompatible with your scenarios, and you reject the hypothesis.

Thus Bayesian statistics starts from what has been observed and assesses possible future outcomes. Frequentist statistics starts with an abstract experiment of what would be observed if one assumes something, and only then compares the outcomes of the abstract experiment with what was actually observed. Otherwise the two approaches are compatible. They both assess the probability of future observations based on some observations made or hypothesized.

I started to write this up in a more formal way:

Positioning Bayesian inference as a particular application of frequentist inference and vice versa. figshare.

http://dx.doi.org/10.6084/m9.figshare.867707

The manuscript is new. If you happen to read it, and have comments, please let me know.

$\endgroup$
1
  • $\begingroup$ I like this way of putting things very much. It is abstract but succint and clear. Thanks! $\endgroup$
    – sophros
    Commented Jan 4, 2021 at 20:27
10
$\begingroup$

I would say that they look at probability in different ways. The Bayesian is subjective and uses a priori beliefs to define a prior probability distribution on the possible values of the unknown parameters. So he relies on a theory of probability like deFinetti's. The frequentist see probability as something that has to do with a limiting frequency based on an observed proportion. This is in line with the theory of probability as developed by Kolmogorov and von Mises.
A frequentist does parametric inference using just the likelihood function. A Bayesian takes that and multiplies to by a prior and normalizes it to get the posterior distribution that he uses for inference.

$\endgroup$
1
  • 6
    $\begingroup$ +1 Good answer, but it ought to be emphasized that the Bayesian approach and Frequency approach differ with respect to their interpretation of probability. Kolmogorov, on the other hand, provides an axiomatic foundation for the theory of probability, which does not require an interpretation (!) like those employed by the Bayesian or Frequentist. In a sense, the axiomatic system has a life of its own! From Kolmogorov's six axioms alone, I don't think it's possible to say that his axiomatic system is either Bayesian or Frequentist, and, could, in fact, be consistent with both. $\endgroup$ Commented Jan 16, 2016 at 19:51
7
$\begingroup$

The simplest and clearest explanation I've seen, from Larry Wasserman's notes on Statistical Machine Learning (with disclaimer: "at the risk of oversimplifying"):

Frequentist versus Bayesian Methods

  • In frequentist inference, probabilities are interpreted as long run frequencies. The goal is to create procedures with long run frequency guarantees.
  • In Bayesian inference, probabilities are interpreted as subjective degrees of belief. The goal is to state and analyze your beliefs.

enter image description here

What's tricky is that we work with two different interpretations of probability which can get philosophical. For example, if I say "this coin has a 1/2 probability of landing heads", what does that mean? The frequentist viewpoint is that if we performed many coin flips, then the counts ("frequencies") of heads divided by the total number of flips should more or less get closer and closer to 1/2. There is nothing subjective about this which can be viewed as a good thing, however we can't really perform infinite flips and in some cases we can't repeat the experiment at all, so an argument about limits or long run frequencies might be in some ways unsatisfactory.

On the other hand, the Bayesian viewpoint is subjective, in that we view probability as some kind of "degree of belief", or "gambling odds" if we specifically use de Finetti's interpretation. For example, two people may come into the coin flipping experiment with different beliefs about what they believe about the coin (prior probability). After the experiment which has collected data/evidence and the people have updated their beliefs in accordance with Bayes' theorem, they leave with different ideas of what the posterior probability of the coin is, and both people can justify their beliefs as "logical"/"rational"/"coherent" (depending on the exact flavor of Bayesian interpretation).

In practice, statisticians can use either kind of methods as long as they are careful with their assumptions and conclusions. Nowadays Bayesian methods are becoming increasingly popular with better computers and algorithms like MCMC. Also, in finite dimensional models, Bayesian inference may have same guarantees of consistency and rate of convergence as frequentist models.

I don't think there is any way around really understanding Bayesian and frequentist reasoning without confronting (or at least acknowledging) the interpretations of probability.

$\endgroup$
4
$\begingroup$

I've attempted a side-by-side comparison of the two schools of thought here and have more background information here.

$\endgroup$
2
$\begingroup$

The way I answer this question is that frequentists compare the data they see to what they expected. That is, they have a mental model on how frequent something should happen, and then see data and how often it did happen. i.e. how likely is the data they have seen given the model they chose.

Bayesian people, on the other hand, combine their mental models. That is, they have a model based on their previous experiences that tells them what they think the data should look like, and then they combine this with the data they observe to settle upon some ``posterior'' belief. i.e., they find the probability the model they seek to choose is valid given the data they have observed.

$\endgroup$
2
  • $\begingroup$ So, in other words, a frequentist looks at $P(data | model)$ whereas a Bayesian looks at $P(model | data)$...? $\endgroup$ Commented Apr 3, 2020 at 13:40
  • $\begingroup$ sorta. Bayesians essentially do a P(model|data) $\prop$ P(data|model)P(model), where P(model) is the prior. The more I learn about this, the more my answer feels inadequate. For ex, a hallmark of frequentist stats is maximum likelihood estimator, which is essentially given the data ive seen, which model parameters make what I saw most likely. Bayesians also want this, but they calculate the model by integrating over all values of the parameter based on some prior distribution of it. Frequentists pick a model parameter such that what they saw was most likely. $\endgroup$ Commented Apr 5, 2020 at 0:50
1
$\begingroup$

In short plain English as follows:

In Bayesian, parameters vary and data are fixed

In Bayesian, $P(\theta|X)=\frac{P(X|\theta)P(\theta)}{P(X)}$ where $P(\theta|X)$ means parameters vary and data are fixed.

In frequentist, parameters are fixed and data vary

In frequentist, $P(\theta|X)=P(X|\theta)$ where $P(X|\theta)$ means parameters are fixed and data vary.

References:

  1. https://stats.stackexchange.com/a/513020/103153
  2. https://math.stackexchange.com/a/2126820/351322
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.