5
$\begingroup$

A "coin" has a fixed unknown bias $0\le p\le1$ for heads, and out of $n\ge0$ tosses it yielded $0\le h\le n$ heads. Note that this occurs with probability $P(h\;|\;p,n)=\binom{n}{h}p^h(1-p)^{n-h}$. We would like a "best guess" for $p$.

The frequentist view is that $p$ should be the maximum-likelihood-estimate $\frac hn$. Indeed $\frac{d}{d\rho}\binom{n}{h}\rho^h(1-\rho)^{n-h}=0$ occurs at $\rho=\frac hn$.

The uniform Bayesian view is that $p$ should be $\frac{h+1}{n+2}$. Indeed it has prior distribution $f(p)=1$ and the posterior distribution conditional on $(n,h)$ is then a Beta distribution $f(p\;|\;n,h)=\frac{P(h\;|\;p,n)f(p)}{\int_0^1P(h\;|\;\rho,n)f(\rho)d\rho}=\frac{(n+1)!}{h!(n-h)!}p^h(1-p)^{n-h}$ hence $\mathbb E[p\;|\;n,h]=\frac{(n+1)!}{h!(n-h)!}\int_0^1\rho^{h+1}(1-\rho)^{n-h}d\rho=\frac{(n+1)!}{h!(n-h)!}\frac{(h+1)!(n-h)!}{(n+2)!}=\frac{h+1}{n+2}$.

I don't yet have intuition for why these two viewpoints are the same if and only if $n=2h$, let me know! But my main question is: What is the frequentist's prior, i.e. what distribution $f$ satisfies $\mathbb E_f[p\;|\;n,h]=\frac hn$ for all pairs $\lbrace(n,h)\in\mathbb Z^2\;|\; 0\le h\le n\rbrace$?

Rephrased, $n\int_0^1\rho^{h+1}(1-\rho)^{n-h}f(\rho)d\rho=h\int_0^1\rho^h(1-\rho)^{n-h}f(\rho)d\rho$. Taking $(n,h)=(1,0)$ forces $f$ to obey $\int_0^1\rho(1-\rho)f(\rho)d\rho=0$ and so under some natural assumptions on the non-negative $f$ this should mean $f$ is almost-everywhere zero and not a normalized sum of Dirac-deltas. I believe this is Qiaochu's answer below.

This would be poetic and intuitive: a frequentist by construction would have no a priori guess, consistent with the fact that the vacuous 0 heads out of 0 tosses has undefined quotient $\frac00$ (whereas the uniform Bayesian invokes symmetry to guess $\frac12=\frac{0+1}{0+2}$).

$\endgroup$
2
  • 2
    $\begingroup$ This video addresses this question or something very similar in slightly different notation. youtube.com/… $\endgroup$
    – user10478
    Commented Jun 9 at 1:47
  • $\begingroup$ @user10478 The video shows that, for a Binomial(n,p) likelihood function, a Beta(a, b) prior for p results in a Beta(h+a, n-h+b) posterior. Therefore, the posterior mean is h/n only for a=b=0 (improper prior); however, with a=b=1 (uniform prior), the posterior density has its maximum at h/n. $\endgroup$
    – r.e.s.
    Commented Jun 9 at 3:52

2 Answers 2

8
$\begingroup$

There is such a prior, but it's an improper one.

It's given by $$ f(p) \propto \frac{1}{p(1-p)}. $$

Formally, you should think of this $f$ as a density function with respect to the uniform prior on $[0,1]$. You can't normalise it because $\int_{0}^{1}\frac{1}{p(1-p)}dp$ diverges. But you can still use it to calculate posteriors, by defining $$ f(p\mid h,n ) \mathrel{:=} \frac{1}{Z}p^h(1-p)^{n-h}f(p) = \frac{1}{Z}p^{h-1}(1-p)^{n-h-1} $$ where $Z$ is whatever it needs to be to make the posterior normalised, I guess $\Gamma(h)\Gamma(n-h)/\Gamma(n)$. The calculations are then basically the same as for the beta distribution, since this prior is in some informal sense "a beta distribution with $\alpha=\beta=0$." You should then easily see that the expectation over the posterior for $p$ is $h/n$ as desired.

This improper prior is known as Haldane's prior, after a 1932 paper by J.B.S. Haldane. (Hat tip r.e.s. in the comments.) I originally learned about it from a paper by E. T. Jaynes called Prior Probabilities (1968), which apparently reinvents it but gives some nice invariance arguments in its favour.

Unfortunately, although improper priors are often used in practice they seem not to be studied much in modern probability theory, so I don't think there's much formal theory about them.

$\endgroup$
9
  • 1
    $\begingroup$ Improper priors are still used all the time in practice because many Bayesian sampling algorithms only use the unnormalized posterior anyway. Nice reference +1 $\endgroup$
    – whpowell96
    Commented Jun 9 at 3:22
  • $\begingroup$ @whpowell96 that's a good point, I've added a remark that they are used in practice, even though it seems they're not formally studied much. $\endgroup$
    – N. Virgo
    Commented Jun 9 at 3:24
  • 2
    $\begingroup$ @N.Virgo This improper prior is historically known as Haldane's prior (published in 1932). The reference is given in the WP article. $\endgroup$
    – r.e.s.
    Commented Jun 9 at 3:39
  • 1
    $\begingroup$ OK this is all makes sense when I just start by computing the general $Beta(\alpha,\beta)$-Bayes prior and taking the limit $(\alpha,\beta)\to(0,0)$. This limiting-prior is improper, whereas the uniform-Bayes prior ($\alpha=\beta=1$) is proper because the random variable $p\in[0,1]$ is compactly supported. In contrast, the uniform-prior for linear regression recovers OLS solution and is improper because the coefficient vector (slope,intercept)$\in\mathbb R^2$ is not compactly supported (and a non-uniform prior represents a regularized OLS that makes a choice of penalization). $\endgroup$ Commented Jun 9 at 19:15
  • 1
    $\begingroup$ A big problem with the Haldane prior is that if $h=0$ or $h=n$ then it concentrates all the posterior distribution at a single point. So if you flip the coin once and see heads then you are almost certain that $p=1$, while if you see tails then you are almost certain that $p=0$ - this is not a practical approach. $\endgroup$
    – Henry
    Commented Jun 10 at 1:41
5
$\begingroup$

There is no such prior. If you flip a single heads then the frequentist says $p = 1$, and if you flip a single tails then the frequentist says $p = 0$, but this is not compatible with a prior in which $\mathbb{P}(\varepsilon < p < 1 - \varepsilon) > 0$ for any $\varepsilon > 0$ (since if $f(p)$ is the prior, then $\int_{\varepsilon}^{1-\varepsilon} f(p) \, dp > 0$ implies $\int_{\varepsilon}^{1-\varepsilon} p f(p) \, dp > 0$ and similarly for $(1 - p) f(p)$, the (unnormalized) posterior distributions after a single heads and tails respectively). The only prior that could produce these results is a discrete prior taking the value $0$ with some probability, $1$ with some other probability, and no other values, but this prior isn't compatible with flipping heads then tails.

$\endgroup$
4
  • 2
    $\begingroup$ Qiaochu! Hilariously I was discussing the problem an hour ago with Alex Zorn, and all three of us were Berkeley "officemates", hope you remember our secret office seminars. Let me mull over your answer. $\endgroup$ Commented Jun 9 at 1:46
  • $\begingroup$ Hey, nice to run into you two again on the internet, hope you're both doing well. $\endgroup$ Commented Jun 9 at 1:47
  • $\begingroup$ Indeed we're both in algorithmic trading for the time being, hence this genre of math problems ;-) Come join. $\endgroup$ Commented Jun 9 at 1:57
  • 1
    $\begingroup$ If you allow improper priors then there is such a prior, see my answer. $\endgroup$
    – N. Virgo
    Commented Jun 9 at 3:16

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .