Likelihood principle and inference

Question

I've been reading Casella and Berger's Statistical Inference. In section 6.3 the author stated the likelihood principle: if the likelihood functions from two samples are proportional, then the inference based on the two samples should be the same. But consider the following simple example: We have a single observation from the distribution $P(X=1)=\frac{5-2p}{3}$ and $P(X=2)=P(X=3)=\frac{p-1}{3}$. The likelihood of 2 and 3 are the same. So the inference for $p=EX$ based on the observation $X=2$ and $X=3$ are the same. This contradicts my intuition since $X$ is a sufficient statistic for $p$ (and hence I think we should use $X$ to infer $p$).

So, is there really a contradiction between the likelihood principle and standard inference about parameter or I did something wrong here?

Are you saying your estimate of $p$ would be $x$, because $p=\operatorname{E}X$? — Scortchi - Reinstate Monica, Commented Mar 21 at 11:22
I do think so because $X$ is a unbiased sufficient statistic for $p$. For only one single observation, I can not come up with any other estimator. — INvisibLE, Commented Mar 21 at 12:13
X isn't minimal sufficient, though. A minimal sufficient statistic $T$ partitions the sample space into $X=1$ & $X\in \{2,3\}$. And if you want an unbiased estimator with lower variance than $X$ it's $\operatorname{E}{X|T}$ (which also won't make impossible estimates of $p$). — Scortchi - Reinstate Monica, Commented Mar 21 at 12:29
Yeah I agree what you've said. If we use the UMVUE, then we observe $X=2$ or $X=3$, we shall both use $\hat{X}=2.5$ to estimate $p$, which is still challenging my intuition though. — INvisibLE, Commented Mar 21 at 12:36
Well, $\Pr(X=2|X\in\{2, 3\})= \Pr(X=3|X\in\{2, 3\}) = \tfrac{1}{2}$regardless of the true value of $p$. It's a coin toss, & including its outcome in your estimator only increases the estimator's variance. — Scortchi - Reinstate Monica, Commented Mar 21 at 14:50

Scortchi - Reinstate Monica · Accepted Answer · 2024-03-22 10:22:36Z

$X$ is an unbiased estimator of $p$, & is indeed sufficient, but not minimal sufficient: $$\Pr(X=2|X\in\{2, 3\})= \Pr(X=3|X\in\{2, 3\}) = \tfrac{1}{2} $$ regardless of the true value of $p$; & so a minimal sufficient statistic $T$ partitions the sample space into $X=1$ & $X\in \{2,3\}$. Then if you want an unbiased estimator with lower variance than $X$, Rao–Blackwellize: $$\operatorname{E}{X|T}= \begin{cases} 1 & \text{ when }X=1\\ 2\tfrac{1}{2} & \text{ when } X\in\{2,3\}\end{cases}$$ This is about as 'standard' as inference gets. (Though perhaps more typical is the case where an unbiased estimator is not sufficient, & can be improved by Rao–Blackwellizing it—say, with a sample size greater than 1, $\bar X$ as an estimator of $p$.)

Note that $T$ may be coded as '0' & '1', for $X=1$ & $X\in\{2,3\}$ respectively; & then writing $\pi=\Pr(T=1)=\tfrac{2(p-1)}{3}$ makes it apparent that we've been concerned with inference about the probability parameter of a single Bernoulli trial. If anything still seems unintuitive, it's perhaps because you'd typically formulate such a model straight off, without drawing unnecessary distinctions between evidentially equivalent outcomes. (Suppose parts coming off an assembly line are tested, & an unknown proportion rejected; then half of those that aren't rejected are painted red and the other half painted blue.)

Michael Lew · Accepted Answer · 2024-03-22 03:24:45Z

There are many differently worded versions of the likelihood principle, but in essence the likelihood principle says that data that yield the same (proportional) likelihood function have the same evidential meaning concerning values of the parameter(s) of interest, according to the statistical model(s). Crucially, it does not say anything at all about inferences that might be informed by such evidence.

Some statements of the likelihood principle talk of 'inference', but that is a mistake. Did Casella and Berger make such a mistake? (I really don't know, as I no longer have access to their book...) If so then I will add a couple of sources that agree with me and not them.

"Within the framework of a statistical model, all of the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses." (Edwards 1972, 1992 p. 30)

The likelihood principle (L): If $E$ and $E′$ are any two experiments with the same parameter space, represented respectively by density functions $f(x, θ)$ and $g(y, θ)$; and if $x$ and $y$ are any respective outcomes determining the same likelihood function; then $Ev(E, x) = Ev(E′, y)$. That is, the evidential meaning of any outcome $x$ of any experiment $E$ is fully characterized by giving the likelihood function $cf(x, θ)$ (which need be described only up to an arbitrary positive constant factor), without reference to the structure of $E$. (Birnbaum 1962)

Neither of those says anything about inference. See this answer on this site for a description of how equal evidence can lead to different inferences without any violation of the likelihood principle.

Given that the likelihood principle does not say anything about inference, your inferences about the results in question need to be informed by more than just the likelihood principle.

Birnbaum, A. (1962), ‘On the foundations of statistical inference’, Journal of the American Statistical Association 57(298), 269–306.

Edwards, A.W.F. (1992), Likelihood: expanded edition, Johns Hopkins University Press, Baltimore.

C&B precisely discussed the framework of evidence function and the likelihood principle (my post is based on that). I think they agree with you too. But I would have a look again. — User1865345, Commented Mar 22 at 3:30
C&B says that if the likelihood function are proportional, the conclusion drawn from them are identical, so I guess maybe I misunderstood the principle. Anyway, thanks for replying and providing such a detailed and comprehensive explanation. I will take a closer look later. — INvisibLE, Commented Mar 22 at 3:44
@INvisibLE I think that we need to be clear about a distinction between identical "conclusions drawn from them" where those conclusions concern the evidential support of various parameter values and conclusions that are broader inferences. The difficulty of being explicit without being prolix may be behind a lot of misunderstandings in this area. — Michael Lew, Commented Mar 22 at 3:48
I agree. I do find myself confused when reading something about likelihood principle. I might read your reference later. — INvisibLE, Commented Mar 22 at 3:55
C&B goes on discussing the binomial-negative binomial inference problem using the Formal Likelihood Principle to show "equivalent inferences from different experiments", fyi. — User1865345, Commented Mar 22 at 4:06

Sextus Empiricus · Accepted Answer · 2024-03-22 14:23:50Z

If $P(X=2)=P(X=3)=\frac{p-1}{3}$, how is $X=2$ versus $X= 3$ giving any different information about $p$?

Whether $X=2$ or $X=3$ happens, is independent from $p$.

Imagine the following situation for generating $X$:

First flip a random coin with probability $(5-2p)/3$ for heads and $(2-2p)/3$ for tails.
- If the result is heads then assign $X:=1$
- If the result is tails then flip another independent fair coin and assign $X:=2$ in case of head and $X:=3$ in case of tails.

Why should that potential second coin flip, which is independent from $p$, make any influence about the inference about $p$?

This contradicts my intuition since $X$ is a sufficient statistic for $p$ (and hence I think we should use $X$ to infer $p$).

The variable $X$ is not the sufficient statistic.

Instead, the sufficient statistic is the odds of $X=1$ versus $X=2$ or $X=3$.

E.g. a sufficient statistic is the count for the number of times that $X=1$.

Infact, in a way the likelihood function is a sufficient statistic. For your example, the likelihood functions for $X=2$ and $X=3$ being equal means that both observations are coupled as independent from $p$ and appear as a single value in the sufficient statistic.

In your situation you wonder about the likelihood principle when two different observations give the same likelihood function. Other situations occur where different models give the same likelihood function. Some related questions about that different angle are:

An example where the likelihood principle *really* matters?

Does the likelihood ratio test violate the likelihood principle?

Stack Exchange Network

Likelihood principle and inference

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
inference
likelihood
sufficient-statistics
likelihood-principle
or ask your own question.

Linked

Hot Network Questions

Likelihood principle and inference

3 Answers 3

Not the answer you're looking for? Browse other questions tagged inferencelikelihoodsufficient-statisticslikelihood-principle or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
inference
likelihood
sufficient-statistics
likelihood-principle
or ask your own question.