2
$\begingroup$

In the context of approximating the evidence $Z$ in a Bayesian inference setting $$ Z = \int d\theta \mathcal L (\theta)\pi (\theta) $$ with $\mathcal L$ the likelihood, $\pi$ the prior, John Skilling's paper on Nested Sampling [1] (something analogous can be found in [2]) claims the following:

The integral for $Z$ is dominated by wherever the bulk of the posterior mass is to be found. Typically, this occupies a small fraction $e^{-H}$ of the prior, where $H = \rm information = \int \log(dP/dX) dP$. $H$ is (minus) the logarithm of that fraction of prior mass that contains the bulk of the posterior mass, and it may well be of the order of thousands or more in practical problems where the likelihood is concentrated in some obscure corner of the prior domain.

where $dX$ was previously defined to be an element of prior volume $dX=\pi(\theta)d\theta$ and $dP$ is the posterior.

The question is: what motivates the statement about the fraction of volume? Is there any proof or intuitive argument to show it?

My guesses so far

  • Concentration within a volume $e^{-\rm entropy}$ looks a lot like what happens for the typical set of sequences of $n$ i.i.d. random variables (e.g. [3] chapter 3). Is something equivalent happening here?
  • This answer defines the typical set of a posterior distribution $p$ as the space of parameters for which $\log (p/\pi)$ deviates from its expectation $H=\int d\theta p(\theta)\log\frac{p(\theta)}{\pi(\theta)}$ (the same quantity as above) by more than some $\epsilon$. But can we argue that $\log (p/\pi)$ is actually concentrating around its expectation? Perhaps it is in some limit?

References

[1] Skilling, John. "Nested sampling for general Bayesian computation." 2006.

[2] Skilling, John. "Bayesian computation in big spaces-nested sampling and Galilean Monte Carlo." 2012.

[3] Thomas M. Cover and Joy A. Thomas. "Elements of Information Theory." 2006.

$\endgroup$

0