What is relationship between beta and binomial distributions in Bayesian inference

Question

I came across this question: Suppose we are giving two students a multiple-choice exam with 40 questions, where each question has four choices. We don't know how much the students have studied for this exam, but we think that they will do better than just guessing randomly. a: What is our likelihood? b: What prior should we use?

The solutions were: a: Likelihood is Binomial(40, theta); b: The conjugate prior is a beta prior.

Could someone please explain why beta is the conjugate prior of a binomial? I meant how could one know $\theta$ was distributed as $Beta$? Could other distributions be used for binomial likelihood, and what is the consequence of not using $Beta$ for the prior?

Thanks in advance.

tommik · Accepted Answer · 2020-08-29 06:33:30Z

The following is a simple and intuitive explanation for your question:

Let's have a binomial distribution (the data model)

$$\mathbb{P}[X=x]=\binom{n}{x}\theta^{x}(1-\theta)^{n-x}$$

where the support is $x=0,1,2,...,n$

This is a DISCRETE rv where X is the variable, (n is known) and $\theta$ is a parameter $\in [0;1]$

Now let's change the point of view, looking at this distribution as a function (continuous function) in the variable $\theta$

As this is a function of $\theta$ we can first discard all the quantities that do not depend on $\theta$ getting

$$f(\theta|x)=\theta^{x}(1-\theta)^{n-x}=\theta^{(x+1)-1}(1-\theta)^{(n-x+1)-1}$$

Now $\theta$ is the variable and $x$ a parameter (observed data) and we recognize a Beta distribution

$$f(\theta|x) \propto Beta[(x+1);(n-x+1)]$$

Now I think it is very easy to verify that Beta is exactly the conjugate prior of Binomial Model

Important Observation: To identify the family of conjugate prior for a Statistical Model there is a very useful factorization theorem

Let's suppose that the model can be written in the following way:

$$\mathbb{p}(\mathbf{x}|\theta)=g[t(\mathbf{x}),n,\theta]\cdot\psi(\mathbf{x})$$

$\forall x,\theta$ and assuming that $g(\theta)$ is integrable on all $\Theta$

Then the family

$$\mathbb{\pi}(\theta)\propto g(s,m,\theta)$$

Is the conjugate prior.

Applying this theorem to the binomial model you immediately identify that

$$g(t,n,\theta)=\theta^{x}(1-\theta)^{n-x}$$

thus the conjugate prior must be of the form

$$\theta^{a}(1-\theta)^{b}$$

that is obviously the kernel of a Beta distribution (to ensure it is a denisty you have to multiply it by the normalization constant, of course, but it is not a problem as the beta distribution is a known density).

Here you can find a very useful table of the most common model with priors, posteriors, parameters and so on...

That's the information I'm looking for (despite being overwhelmed with the factorization theorem), thanks @tommik. — Nemo, Commented Aug 29, 2020 at 11:25

Michael Hardy · Accepted Answer · 2020-08-29 03:51:11Z

2

The point is that if the prior is a beta distribution and the likelihood comes from a binomial distribution, then the posterior is a again a beta distribution.

answered Aug 29, 2020 at 3:51

Michael Hardy

1

$\begingroup$ Thanks, Michael. I edited my OP for a clearer objective, that is, how could we tell the prior was distributed as $Beta$? $\endgroup$
– Nemo
Commented Aug 29, 2020 at 5:59

Add a comment |

user801306user801306 · Accepted Answer · 2020-08-29 03:54:06Z

2

If $\Theta \sim Beta(\alpha,\beta)$ and $X|\Theta=\theta\sim B(n,\theta)$ then it turns out that $$\Theta|X=x \sim Beta(\alpha+x,\beta+n-x)$$ which you verify yourself by evaluating its density: $$f_{\Theta|X=x}(\theta|x)=\frac{f_{X|\Theta=\theta}(x|\theta)f_{\Theta}(\theta)}{\int_{0}^{1}f_{X|\Theta=\theta}(x|\theta)f_{\Theta}(\theta)d\theta}$$ Here we have $$f_{X|\Theta=\theta}(x|\theta)={n \choose x}\theta^x(1-\theta)^{n-x}$$ for $x=0,1,...,n$ while $$f_\Theta(\theta)=\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}$$ for $\theta \in [0,1]$. You can, of course, assign alternative priors for $\Theta$, but the posterior density $\Theta|X=x$ may not belong to the same "family" of distributions.

answered Aug 29, 2020 at 3:54

user801306

$\begingroup$ Thanks for your detailed explanation, Mathew. However, could you please elaborate more on why $Beta$ (but not others) is the distribution for $\theta$; and if we didn't use $Beta$ for prior, why the posterior might not belong the same family, and what consequence would be? $\endgroup$
– Nemo
Commented Aug 29, 2020 at 5:47
1

$\begingroup$ @Nemo : if you take a Uniform(0;1) it is also a conjuate prior...actually it is a Beta(1;1). For an intuitive explanation read my following answer $\endgroup$
– tommik
Commented Aug 29, 2020 at 6:11

Add a comment |

Stack Exchange Network

What is relationship between beta and binomial distributions in Bayesian inference

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
bayesian
binomial-distribution
beta-function
.

Linked

Hot Network Questions

What is relationship between beta and binomial distributions in Bayesian inference

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged bayesianbinomial-distributionbeta-function.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
bayesian
binomial-distribution
beta-function
.