17
$\begingroup$

I wish to compute the effective sample size (ESS) for a posterior sample of size $M$. I have looked at several documentations (e.g. WinBUGS p11; Stan sec 15.4) and several other Stack Exchange questions (e.g. this and this). There seems to be several variations of definitions of variance (what correction to use in denominator), autocorrelation, and ESS formulas. I have tried many variations, but none of them match the results from effectiveSize(). At the 'top-level', the first 2 sources concur that the formula for ESS is:

$$ESS=\frac{M}{1+2\sum_{k=1}\rho_k}$$

although variants exist, such as here, suggesting it should be $|\rho|$, and here, suggesting a completely different variant:

$$ESS=M\frac{\lambda^2}{\sigma^2}, ~~~\text{ where } \sigma^2 = \lambda^2 + 2\sum_{k=1}\rho_k \text{ and } \lambda^2 = \text{var}(x)$$

where $x$ is a vector representing an estimate of a parameter at each MCMC iteration. Then there is the issue of how the autocorrelation, $\rho$, is calculated. This suggests:

$$\rho_k = \frac{\text{cov}[x_t, x_{t+k}]}{\text{var}[x_t]}$$

where $\text{var}[x_t]$ could have a denominator of $M$ or $M-1$. But most sources just describe $\rho$ without giving an explicit formula, e.g. "...$\rho_k$ is the lag $k$ autocorrelation for one parameter..." link.

I'm not even confident about the expressions for $x_t$ and $x_{t+k}$. My understanding is

$$x_t = \{x_1,x_2,\ldots,x_{M-k}\}$$ $$x_{t+k} = \{x_{k+1},x_{k+2},\ldots,x_M\}$$

I can compute the ESS in R using the coda package easily enough:

coda::effectiveSize(x)

But I want to understand how this is computed.

So what is the correct definition of the ESS?


For clarity, I have provided R code (updated for new variant) which attempts to demonstrate the lack of agreement.

Here is a user-defined R function which computes several variations of rho ($\rho$) and the ESS formula:

get.ESS <- function(x, autocor = 1, ESS = 1){
    M <- length(x)
    # Re-centre at zero
    x <- x - mean(x)
    # Autocorrelations
    rho <- rep(NA, M-2)
    for(lag in 0:(M-2)){
        # Split time-series in two, separated by 'lag'
        theta.1 <- x[1:(M-lag)]
        theta.2 <- x[(1+lag):M]
        # Autocorrelation
        if(autocor %in% c(1, 4)){
            rho[lag + 1] <- cor(theta.2, theta.1)
        }else if(autocor == 2){
            rho[lag + 1] <- mean(theta.1 * theta.2) / var(x)
        }else if(autocor == 5){
            rho[lag + 1] <- cov(theta.2, theta.1)
        }
    }
    if(autocor == 3){
        rho <- stats::acf(x = x, lag.max = M-2, plot = FALSE)$acf
    }else if(autocor == 4){
        # 'Relative' autocorrelation
        rho <- rho * (M - 0:(M-2)) / M
    }

    # Effective sample size
    if(ESS == 1){
        E <- M / (1 + 2 * sum(rho))
    }else if(ESS == 2){
        E <- M / (1 + 2 * sum(abs(rho))) 
    }else if(ESS == 3){     # To be used with autocor = 5
        lambda.sq <- M * var(x)
        sigma.sq <- lambda.sq + 2 * sum(rho)
        E <- M * lambda.sq / sigma.sq
    }
    return(E)
}

The arguments autocor and ESS set the variation to be computed. Results:

M <- 300    # Number of MCMC iterations
set.seed(1)
x <- rnorm(M, 0, 1)

get.ESS(x, 1, 1)        # -178.352
get.ESS(x, 2, 1)        # 153.4736
get.ESS(x, 3, 1)        # 150.1211
get.ESS(x, 4, 1)        # 149.9286
get.ESS(x, 5, 1)        # 367.5048

get.ESS(x, 1, 2)        # 5.553791
get.ESS(x, 2, 2)        # 6.272066
get.ESS(x, 3, 2)        # 14.85799
get.ESS(x, 4, 2)        # 14.65066
get.ESS(x, 5, 2)        # 6.524995

# Compare with:
coda::effectiveSize(x)          # 300 = M

As you can see, these aren't even close to being correct. The help file on effectiveSize wasn't very helpful on how they compute it.

So what is the correct effective sample size (ESS) calculation (at least that which is used by effectiveSize)?


Update:

Using the variant suggested here, i.e. $ESS=M\frac{\lambda^2}{\sigma^2}$:

get.ESS(x, 1, 3)        # 300.0289
get.ESS(x, 2, 3)        # 299.9897
get.ESS(x, 3, 3)        # 299.9893
get.ESS(x, 4, 3)        # 299.9892
get.ESS(x, 5, 3)        # 300.1979   - the actual definition proposed

The results seem correct, regardless of the variant of $\rho$ (which is strange). Using a highly correlated sample for $x$ instead of white noise shows similar values (300 or larger), which is clearly wrong - it should reduce as correlation increases. So the question still stands.

$\endgroup$
3
  • 1
    $\begingroup$ What do you mean by "correct"? $\endgroup$
    – jaradniemi
    Commented Oct 1, 2019 at 21:10
  • 1
    $\begingroup$ I think the last sentence already clarifies this? Given the sample is white noise, the ESS should be as large as the actual sample size, i.e. 300. $\endgroup$
    – Earlien
    Commented Oct 1, 2019 at 22:19
  • $\begingroup$ Just noticed that the second equation appears messed up if you're viewing this in Chrome, but it appears just fine in Firefox. This is not something I can fix :S $\endgroup$
    – Earlien
    Commented Nov 5, 2019 at 1:12

1 Answer 1

18
$\begingroup$

After further research, I've made some useful discoveries. The answer appears to be anything but straightforward. Let me start by answering my second question above: "What is the correct effective sample size (ESS) calculation (at least that which is used by effectiveSize)?"

The answer is that the effectiveSize function from the R coda package uses the second definition I described in the question, namely

$$ESS = M \frac{\lambda^2}{\sigma^2}$$

where $\lambda^2 = \text{var}(x)$ is the sample variance as defined above, but $\sigma^2$ is defined as an estimate of the spectral density at frequency zero. (effectiveSize uses the function spectrum0.ar to do this, also from the coda package.) More generally, $\sigma^2$ is an estimate of the variance in the Central Limit Theorem.

I wish I understood what "the spectral density at frequency zero" meant so I could elaborate, but hopefully this is a useful starting point for anyone who wishes to a) understand the calculation behind coda::effectiveSize or b) wishes to program a user-defined function for computing sample size.


Now to answer my first question: "What is the correct definition of the ESS?"

As far as I can tell, there isn't one correct definition. What tipped me off was a paper which mentioned two R packages for calculating the ESS. After trying both on toy examples, I was still getting drastically different results. I'm not sure what definition of ESS is being used in the second package (mcmcse::ess), but it demonstrates multiple definitions do exist. On that note, here is a concise list of definitions that I've found so far. (Disclaimer: I cannot vouch for their correctness.)

Note that they all take the general form:

$$ ESS_{\text{i}} = \frac{M}{\tau_i} $$ where $M$ is the un-adjusted sample size (i.e. length of the vector $x$), and subscript $i$ denotes the specific definition. Hence I will focus on the definitions of $\tau_i$. In no particular order:

$$ \tau_1 = 1 + 2\sum_{l=1}^\infty \rho(l) $$

where $\rho(l)$ is the sample autocorrelation at lag $l$.

Sources: https://www.johndcook.com/blog/2017/06/27/effective-sample-size-for-mcmc/, https://mc-stan.org/docs/2_20/reference-manual/effective-sample-size-section.html, https://people.orie.cornell.edu/davidr/or678/handouts/winBUGS.pdf, https://arxiv.org/pdf/1403.5536v1.pdf

$$ \tau_2 = \frac{1+\rho(1)}{1-\rho(1)} $$

Source: https://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Theory/TSA_theory_part1.pdf

\begin{align*} \tau_3 = & \frac{1}{M}\sum_{k,l=1}^M \text{cov}(x_k, x_l) \\ = &1 + 2 \left( \frac{M-1}{M}\rho_1 + \frac{M-2}{M}\rho_2 + \cdots + \frac{1}{M}\rho_{M-1}\right) \end{align*}

(Note this is similar to $\tau_1$, but seems to use covariance rather than autocorrelation. Also, it seems $x_k$ and $x_l$ are scalars rather than vectors, which makes no sense to me.)

Source: Definition of autocorrelation time (for effective sample size)

$$ \tau_4 = \frac{\sigma^2}{\lambda^2} $$

where $\lambda^2 = \text{var}(x)$ and $\sigma^2 = \lambda^2 + 2\sum_{l=1}\rho(l)$.

Sources: https://arxiv.org/pdf/1403.5536v1.pdf, Effective Sample Size greater than Actual Sample Size, https://cran.r-project.org/web/packages/mcmcse/mcmcse.pdf

$$ \tau_5 = \frac{\sigma^2}{\lambda^2} $$

where $\lambda^2 = \text{var}(x)$ and $\sigma^2$ is an estimate of the spectral density at frequency zero (the coda::effectiveSize definition above).

Note that the Stan Reference Manual provides an extension to $ESS_1$ for multiple MCMC chains.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.