3

I am reading Kotzen's paper Selection Bias in Likelihood Arguments.

The author takes the following principle as a starting point:

enter image description here

I'm confused as to how to formalize this notion in terms of Bayesian statistics, since it does not take into account the prior distributions.

More precisely, we know that

P(E|H1) α L(H1)Pr(H1)

where L(H1):=P(H1|E) is the likelihood distribution, and Pr(H1) is the prior distribution of H1. Similarly,

P(E|H2) α L(H2)Pr(H2)

The claim, then, becomes that E is evidence for H1 over H2 iff L(H1)>L(H2).

I am having a hard time understanding the term "is evidence for H1 over H2", since this can be understood in two ways:

  1. E favors H1 over H2 if L(H1)>L(H2) means that E tilts the ratio Posterior(H1)/Posterior(H2) in favor of H1. E.g. if Posterior(H1)/Posterior(H2) = 1/2 beforehand, now we may have that Posterior(H1)/Posterior(H2) = 1, and so while beforehand Posterior(H2) was more likely, now Posterior(H1) is just as likely. In this sense, we can say E favors H1 over H2.

  2. But then the terminology "favor" seems to suggest implicitly that Posterior(H1) > Posterior(H2) when E is presented. This interpretation takes the author to be saying: E favors H1 over H2 if L(H1)>L(H2) means that Posterior(H1) > Posterior(H2) where the posterior and likelihood are calculated w.r.t E. This seems false, since it doesn't take into account the prior; that is unless we assume a flat prior.

The question I have is:

  1. Which of the two interpretation is meant by the Likelihood Principle?

  2. Is my formalization of the notion correct?

2
  • 1
    Likelihood is irrelevant to priors and your ref is specifically 'law of likelihood' in Likelihood principle: In Bayesian statistics, this ratio is known as the Bayes factor, and Bayes' rule can be seen as the application of the law of likelihood to inference...Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely used method of maximum likelihood.... Commented Oct 25, 2023 at 1:25
  • Priors are relevant to the Bayes factor in that the hypotheses may have nuisance parameters, and ideally it would be the ratio of marginal likelihoods, after having integrated over those nuisance parameters, weighted by their prior distributions. Commented Oct 25, 2023 at 17:06

2 Answers 2

3

The quote you have reproduced is not really a good statement of the likelihood principle. Let's take a few steps back and look at the basic concepts.

What we are doing in statistical inference of this kind is describing a relationship between data, which is a set of observed quantities E and a model or hypothesis H, which is a function defined over the domain of the observable quantities.

Typically we wish to compare rival hypotheses to see which one gives the better fit with the data. We connect the models to the data by a likelihood function L(H,E) which is intended to be the probability (or probability density) of E for a given hypothesis H. With discrete variables, L(H,E) is simply P(E|H). With continuous variables we have to be a little more sophisticated, but the difference is not important for our purposes.

We can now define the likelihood ratio of two rival hypotheses H1, H2 as P(E|H1) / P(E|H2). This ratio expresses the extent to which the evidence E tends to support hypothesis H1 relative to H2. Within the Bayesian framework, this ratio can be used as follows:

P(H1|E)    P(E|H1)   P(H1)
------- =  ------- . -----
P(H2|E)    P(E|H2)   P(H2)

In other words, the ratio of the posteriors is the ratio of the priors multiplied by the likelihood ratio. If the likehihood ratio is greater than 1, this supports H1 relative to H2. If it is less than 1, it supports H2 relative to H1. The likelihood ratio does not tell us anything about the absolute value of the ratio of posteriors, only how they compare with the priors. So your first interpretation is correct.

The likelihood principle itself is the proposition that for a given hypothesis and data set, all of the information or evidence that relates the data to the hypothesis is contained within the likelihood function. This is actually a controversial principle and is disputed by theoreticians. Those who reject it claim that the evidential import of the data depends on other things such as the design of the experiment, or on the stopping rule that was used to determine how the data was gathered. Some have claimed that the likelihood principle can be derived from the more fundamental conditionality principle and sufficiency principle, though again, this is disputed (e.g. see here).

The likelihood principle is often divided into weak and strong versions, but that is beyond the scope of this answer.

2

Your first interpretation is correct; however, the likelihood principle stands on its own, it doesn't need a Bayesian interpretation. That is extra.

That being said, we can connect the likelihood ratio to the behavior of the posterior:

P(H|E) = P(H)P(E|H) ⇒ P(H|E)/P(H) = P(E|H) = L(H|E)

Now, we can see that the likelihood of H is just the ratio of the posterior to the prior.

Applying this to your case, we have

L(H1) > L(H2) ⇒ P(H1|E)/P(H1) > P(H2|E)/P(H2) ⇒ P(H2)/P(H1) > P(H2|E)/P(H1|E)

So data E is evidence for H1 over H2 to the extent that it reduces the ratio of the probability of H2 to H1 (or vice versa).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .