9
$\begingroup$

I am using Bayesian Bootstrap for some analysis. Given dataset $X=\{x_1, \dots, x_N\}$, we generate bootstrapped samples $X_1,\dots, X_K$ by sampling from the $X$, with replacement. In classical bootstrap, the weights are equal, that is, each data $x_n$ in $X$ has $\pi_n=1/N$ probability of being present in $X_k$.

In a Bayesian variant, these probabilities $\pi_n$ are sampled from a non-informative, flat Dirichlet distribution $$ \pi_n\sim p(\pi|X)=\mathcal{D}ir(\pi;\alpha) $$ where hyperparameter $\alpha$ is $[1,\dots, 1]\in R^N$. Then I use these samples to find the distribution over some statistic $\phi$ of each sample $X_k$.

Now my questions are as follows:

  1. In the literature, for the classical case, the distribution of $\phi$ is referred to as sampling distribution of $\phi$. In the Bayesian case, however, it is called posterior of $\phi$. According to Bayes rule, I can say $$ p(\pi|X)\propto p(X|\pi)p(\pi) $$ Now I don't understand how the introduction of a prior over sampling weights makes the distribution of $\phi$ a posterior?

  2. In the classical case, what does it mean to say that the underlying assumption is that distribution of the data is the distribution of the population? What is population referring to here?

$\endgroup$
2
  • $\begingroup$ Since there's no single definition of Bayesian bootstrap, could you point the source describing it that you're referring to? $\endgroup$
    – Tim
    Commented Feb 2, 2020 at 17:37
  • 1
    $\begingroup$ projecteuclid.org/euclid.aos/1176345338, $\endgroup$
    – Blade
    Commented Feb 2, 2020 at 17:45

1 Answer 1

10
$\begingroup$

In classical bootstrap, by taking samples with replacement from your data, you mimic sampling your data from the population. By repeating this process $K$ times you simulate the process of drawing your sample multiple times, and so, it lets you evaluate the possible variability estimate of your statistic $\phi$ (a function) over different samples from the same population $\hat\phi_k = \phi(X_k)$. So, we are simulating the "sampling distribution" of the statistic (variability due to the sampling process).

In Bayesian bootstrap, as described by Rubin (1981), you are estimating the distribution of your data $X = \{x_1,x_2,\dots,x_N\}$ and the posterior distribution of the estimates of the statistic $\phi(X)$. It is a non-parametric model, where we assume a categorical distribution over your datapoints (the likelihood)

$$ x_i|\boldsymbol{\pi} \sim \mathcal{Cat}(\boldsymbol{\pi}) $$

and for the unknown probabilities $\boldsymbol{\pi} = (\pi_1, \pi_2, \dots, \pi_N)$, we assume a uniform Dirichlet prior

$$ \boldsymbol{\pi} \sim \mathcal{Dir}(\boldsymbol{\alpha}) $$

paramterized by $\boldsymbol{\alpha} = (\alpha_1, \alpha_2, \dots, \alpha_N)$ where $\alpha_1 = \alpha_2 = \dots = \alpha_N = 1$. By plugging it into Bayes theorem, we are able to estimate the posterior distribution over the probabilities

$$ p(\boldsymbol{\pi}|X) \propto p(X|\boldsymbol{\pi}) \, p(\boldsymbol{\pi}) $$

Knowing the posterior distribution of the probabilities, leads us to knowing the posterior predictive distribution (the distribution of the data predicted, as by the model),

$$ p(\tilde x|X) = \int_\boldsymbol{\pi} p(\tilde x|X,\boldsymbol{\pi}) \, p(\boldsymbol{\pi}|X) \, d\boldsymbol{\pi} $$

Next, we can easily estimate the distribution of the test statistic estimated on the data, by sampling from the posterior predictive distribution and plugging-in the statistic to the posterior samples $\phi(\tilde x)$. As you can see, we are not estimating the distribution of $\phi$ directly, but rather we are evaluating the statistic over the samples from the posterior distribution. This is what is meant by "simulating the posterior distribution of the parameter $\phi$". This distribution accounts both for the variability of the parameters $\boldsymbol{\pi}$ and the data $X$.

Answering your first question, it is a posterior distribution because we are operating in Bayesian setting. We have prior and likelihood, by combining them we estimate the posterior distribution. We are estimating the posterior distribution over the probabilities $\boldsymbol{\pi}$. The difference is that in frequentist setting you wouldn't be able to estimate the distribution of the parameter, you can only evaluate the statistic over the sample, frequentist statistics are centered over overcoming this problem.

As about your second question, I believe it is answered in the What is the difference between a population and a sample? thread. Basically, the "population" can be used in here exchangeably with "distribution" of the data. Taking samples from the population, is equivalent to having realizations of a random variable "from" its distribution. Those are statistics vs probability theory terms denoting practically the same thing.

You may also be interested in reading the Is it possible to interpret the bootstrap from a Bayesian perspective? thread, and the two blog posts The Non-parametric Bootstrap as a Bayesian Model and Easy Bayesian Bootstrap in R by Rasmus Bååth, who discusses Bayesian bootstrap in greater detail and gives many examples.

As a sidenote, Rubin (1981) himself noticed that the difference between two procedures is mostly conceptual, about how do we think about the results, as they are "quite similar inferentially", and "operationally they are very similar". The procedure differs slightly, as you sample the data with random weights (drawn from Dirichlet-uniform distribution) instead of the $1/n$ ones, as in classic bootstrap. The interpretation of the results differs because we account for the variability of the parameters, as described above.

$\endgroup$
8
  • $\begingroup$ If I call distribution over $\phi$ a posterior, how should I represent this distribution? $p(\phi|X)$? Then what are the likelihood and prior? $p(X|\phi)$ and $p(\phi)$? Then what was $p(\phi)$ in the first place? $\endgroup$
    – Blade
    Commented Feb 3, 2020 at 1:07
  • $\begingroup$ @Blade $\phi$ is a statistic, function of $X$, not a parameter in here. $\endgroup$
    – Tim
    Commented Feb 3, 2020 at 5:58
  • $\begingroup$ You are not consistent with either the paper notation or my notation, and use $\hat{\phi}$ and $\phi$ interchangeably. I appreciate if you can clarify your answer. In the first paragraph, you call $\hat{\phi}$ a statistic. This goes in line with paper that says: "A statistic $\hat{\phi}$ is chosen to estimate a parameter $\phi$ of the distribution of $X$". But then you say $\phi$ is statistics, not parameter. $\endgroup$
    – Blade
    Commented Feb 3, 2020 at 17:31
  • 1
    $\begingroup$ @Blade you are right, I made few shortcuts & omitted some details assuming they are not important for answering this question. I'll edit to clarify it later today, or tomorrow. $\endgroup$
    – Tim
    Commented Feb 3, 2020 at 18:09
  • 1
    $\begingroup$ @Blade This would be a kind of "poor man's" Bayesian nonparametric estimators, that would not offer you much more from standard bootstrap, & would suffer from same limitations. While it is an interesting question on how much worse it could be as compared to having direct Bayesian regression model, I'm afraid I cannot answer this. $\endgroup$
    – Tim
    Commented Feb 4, 2020 at 17:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.