1
$\begingroup$

I guess that the prior and posterior predictive distributions can be considered expectation of $p(y|\theta )$ (in case of prior predictive distribution) and $p(\widetilde{y}|\theta )$ (in case of posterior predictive distribution).

But, I cannot fully convince myself about the idea. This is because, I am not sure if $p(y|\theta )$ and $p(\widetilde{y}|\theta )$ are multiplied by correct probabilities.

Assuming that $p(\theta )d\theta $ and $p(\theta|y )d\theta $ are probability of getting $\Theta$, we multiply $ p(y|\theta )$ and $p(\widetilde{y}|\theta )$ by the probabilities respectively. If the two expressions of the predictive distributions actually calculated expectations, it would mean that we would be able to say, for example, "we get $ p(y|\theta )$ with probability of $p(\theta )d\theta $". Can we say this?(same question for the posterior predictive distribution) Whether I can say that or not is what I struggle to understand.

Since $ p(y|\theta )$ depends on $\theta $, so we could say it ?

The below is equations for the two distributions.

p() is a densintifunction.

Prior predictive distribution

$p(y) = \int p(y|\theta )p(\theta )d\theta $

Posterior predictive distribution ($\widetilde{y}$ is unobserved quantity while $y$ is observed)

$p(\widetilde{y}|y) = \int p(\widetilde{y}|\theta )p(\theta|y )d\theta $

$\endgroup$
5
  • 1
    $\begingroup$ I'm really not sure where you issue is, but everything you wrote is correct, both the equations and your explanation of the equations. And we indeed can think of the marginal distributions $p(y)$ and $p(\tilde y | y)$ as "averaging over the unknown parameter $\theta$ ". The difficulties are normal: it'll become clearer with experience. $\endgroup$ Commented Jul 2 at 8:18
  • $\begingroup$ I thought it was awkward to take an expectation of $p(y|\theta )$ over $\theta$ because the typical expression of an expectation of a random variable X multiplies a value that the variable takes, x, by probability of getting the value, and you take an integral. Here, we multiply $p(y|\theta )$ by probability of getting $\theta$. $\endgroup$ Commented Jul 7 at 3:53
  • $\begingroup$ What I tried to communicate was that it would be easy to understand if you multiply something by its probability and take integral to calculate an expectation because this is about the same thing, the something. But, here, we multiply by probability of $\theta$ to calculate expectation of $p(y|\theta)$. They are not about the same thing, one is about theta and the other is about a distribution of y given $\theta$. Does what I tried to communicate make sense? $\endgroup$ Commented Jul 7 at 4:06
  • $\begingroup$ That's the standard formula for finding the marginal distribution of $Y$ from the joint distribution of $Y$ and $\theta$. We sum (or integrate; these operations are the same except one is over discrete and the other over continuous spaces) over the values of $\theta$ the joint distribution $f(y,\theta)$. We can also think of it as computing the average value of the probability of $y$ which is $f(y|\theta)$ which is the formula you gave. $\endgroup$ Commented Jul 7 at 20:48
  • $\begingroup$ Again, I don't quite understand what exact step is causing you trouble. It might help to just do basic exercises with dices that have you apply bayes' theorem, marginalization, conditioning. I think everybody struggles with these formulas initially before they click. A bit of practical experience is probably all you need before returning to this math and it seeming suddenly obviously true. $\endgroup$ Commented Jul 7 at 20:49

0