3
$\begingroup$

Suppose I have 1 heads and 4 tails from 5 coin tosses. To find out the probability of 1 heads and 4 tails in my coin toss experiments, I decided to use Binomial Probability Mass Function for the calculation of the probability on the current observations.

I used Maximum a Posterior with the Beta prior $ (\alpha=5, \beta=5) $, instead of Maximum Likelihood Estimation, to estimate the parameter value of $ \theta $ and I got $ 0.3846 $ from Maximum a Posterior. Now, I have the parameter value of $\theta$ and I want to find out the probability that I would observe 1 heads and 4 tails, which one of the following equations should I plug in the estimate parameter value, $ 0.3846 $:

  1. $ {n}\choose{k} $ $ \Pi^n_{i=1} \theta^{x_i} (1-\theta)^{(1-x_i)} \theta^{(\alpha-1)} (1-\theta)^{(\beta-1)}$

The above equation is considering the prior probability for the calculation of the probability for my experiment. Or should I just plug the value of $ \theta $ into the Binomial Probability Mass Function?

  1. $ {n}\choose{k} $ $ \Pi^n_{i=1} \theta^{x_i} (1-\theta)^{(1-x_i)}$

I know this sounds very naive but I just want to make sure I am thinking correctly.

$\endgroup$
0

1 Answer 1

2
$\begingroup$

You have data $X \sim \mathcal{B}(\theta)$, and you observe $(X_1,\dots,X_n)$, $X_o = \sum X_i$, which means that for $k\in \{0,\dots,n\}$,

$$ \mathbb P(X_o = k \mid \theta) = \binom{n}{k} \theta^k(1-\theta)^{n-k} $$

From a bayesian perspective you also have a prior distribution on $\theta$ which is a Beta distribution $p(\alpha,\beta)$.
Then you get the posterior distribution of $\theta$, \begin{align*} p(\theta \mid X_o) &\propto p(\alpha,\beta) \mathbb P(X_o \mid \theta) \\ &\propto p(\alpha,\beta) \theta^{X_o} (1-\theta)^{n-X_o}, \end{align*} and you can derive the maximum a posteriori,

$$ \hat \theta= \arg\max_{\theta} p( \theta \mid X_o) $$

Now if you want to compute the probability of getting 1 head out of 5 (new and independent) coin tosses, given that the probability of head is $\hat \theta$, you can simply put $\hat \theta$ in the likelihood $\mathbb P(X \mid \theta)$:

$$ \mathbb P(X=1 \mid \hat \theta ) = \binom{5}{1} \hat \theta(1-\hat \theta)^4 $$


But from a Bayesian point of view it may be more relevant to consider all possible values of $\theta$, and not just a point estimate like $\hat \theta$, and rather compute

$$ \mathbb P(X= 1 \mid X_o) = \int_\Theta P(X = 1 \mid \theta) p(\theta \mid X_o)d\theta $$

$\endgroup$
3
  • $\begingroup$ Thank you for the answer and I am trying to understand the last equation but it does not come straight. Do you mean a prior probability distribution by $ p(\theta | X_0) $ and do you mean a likelihood by $ P(X=1 | \theta) $? I don't know If you can stretch the last equation a bit further, I would really appreciate. $\endgroup$ Commented Jan 23, 2020 at 13:21
  • $\begingroup$ The last equation is the sum of $\mathbb P (X \mid \theta)$ over all values of $\theta$ weighted by $p(\theta \mid X_o)$ which is your current knowledge about the distribution of $\theta$. This weigthed sum is called the posterior predictive distribution. See stats.stackexchange.com/…. $\endgroup$
    – periwinkle
    Commented Jan 23, 2020 at 13:25
  • $\begingroup$ Thank you very much!! $\endgroup$ Commented Jan 23, 2020 at 13:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.