3
$\begingroup$

Could someone explain to me precisely what is meant by prior predictive check, in Bayesian inference? In some documents, one uses observed data (“in which we compared the observed data to the predictions of the model”), in certain others one doesn't not use the observed data (“summarizing our knowledge prior to observing the data” ).

According to my knowledge of Bayesian (but which is far from expert), the first case rather reminds me of what is called the posterior predictive check which itself seems quite clearly documented, and for which I believe that I understood the technique well. For the prior predictive check, on the other hand, the way to proceed is still not clear to me.

So, so as not to speak in a vacuum, I give a (slightly artificial) example below.

Let's imagine that I am trying to model the number of vehicles passing a given road point in one minute, for which it seems reasonable to me to use a Poisson distribution with parameter $\lambda$. I learned that we most often use a Gamma distribution as prior for $\lambda$.. As in similar situations, the average of vehicles passing in 1 minute is around $20$, it seems to me that I should use a Gamma( $\alpha$, $\beta$ ) distribution with $\alpha$/$\beta$ ~ $20$. Except that I can take as a couple ($\alpha$, $\beta$) the couple (2, 0.1), or (20,1), or many others…

My current understanding of the prior predictive check therefore leads me to proceed as follows:

  1. I decide the number of observations of my Poisson distribution that I will make, let's say $n = 100$.
  2. I give myself two values ​​$\alpha$ and $\beta$ such that $\alpha$/$\beta$ $=20$.
  3. I sample a value $\lambda_i$ from Gamma($\alpha$, $\beta$).
  4. With this $\lambda_i$, I sample $n$ values ​​from Poisson($\lambda_i$) and note the maximum $M_i$ of the $n$ sampled values.
  5. I repeat $N$ times (for example $1000$ times) points 3) and 4).
  6. I plot a histogram of $N$ maximum values ​​$M_i$.
  7. I create several histograms for different couples ($\alpha$, $\beta$).

The result I obtained is given by the plot below (I am not giving the entire program so as not to overload the post): enter image description here A discussion on the experiment to be carried out concludes that it is impossible to assume that several hundred vehicles can pass at the given point in 1 minute (the couple $(0.2, 0.01)$ must be eliminated for excessive maximum values); on the other hand it sometimes happens that a hundred, or a little more, vehicles can pass (the couples $(20, 1)$ and $(200, 10)$ must be eliminated because the maximum values ​​are too low).

Finally, I opt for the prior Gamma$(2, 0.1)$, which appears the most adequate.

Does this reasoning really constitute a prior predictive check? Is this the usual way of reasoning?

And if not, if I was completely wrong in detailing this example, could you give me a concrete example of how to do a prior predictive check?

Any information to resolve my doubts will be welcome!

$\endgroup$

1 Answer 1

3
$\begingroup$

I believe your reasoning about prior predictive checks is correct. For example consider this brief section from the stan manual: https://mc-stan.org/docs/stan-users-guide/posterior-predictive-checks.html#prior-predictive-checks

The manual describes a prior predictive check as a special (limiting) case of a posterior predictive check where no data is included - just priors being fed into the model, like in your example (here “data” refers to the variable being modeled, since when we have predictors they must be included - as the manual says: “[predictors] do not have a generative model from which to be simulated”).

The goal of the prior predictive check is to assess whether your prior model is appropriate, which is what you do in your example.

Your focus on maximum values is an extra step of analysis that suits the particular question. In general the prior predictive check in your example could be as simple as generating a $\lambda$ and then generating a $\text{Poisson}(\lambda)$ $n$ times and comparing the distribution with your data/what you believe to be reasonable.

The manual also contains a worked example similar to yours demonstrating the use of a prior predictive check to rule out unreasonable priors. Actually, I think it is closely related to your example which should give you some confidence in your understanding.

$\endgroup$
2
  • $\begingroup$ Thank you for your detailed response. I now understand why some authors spoke of prior predictive check, while using observed data (!), which for me was contradictory with the idea of prior... In fact, it seems to me that the notions of prior predictive and posterior predictive are (conceptually) identical: in the case of a prior prediction, we generate data which we assume could be observed during the experiment; in the case of posterior prediction, we generate data which we believe could be observed after the experiment. $\endgroup$
    – Andrew
    Commented Jun 2 at 10:29
  • $\begingroup$ But, basically, if I understand correctly, it is always a question of generating data outside the experiment itself, for a verification of the adequacy of the supposed distribution of the parameter. So, I better understand the sentence from the Stan manual: “The prior predictive distribution is just like the posterior predictive distribution with no observed data, so that a prior predictive check is nothing more than the limiting case of a posterior predictive check with no data.” $\endgroup$
    – Andrew
    Commented Jun 2 at 10:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.