2
$\begingroup$

I am trying to understand in layman's terms how the anscombe transform converts a poisson distribution into a normal distribution. So, why is a log transform not sufficient in its own to obtain the normal distribution.

I understand that anscombe transform performs a variance stabilization. Is this somewhat similar to applying a z-score transform/standardisation, such that variance tends to 1 or constant? Is it such that log transform on its own is not able to produce a stable enough variance even though the distribution become normal?

@Henry so it is the case that anscombe stabilises the variance, whilst the log transform transforms the standard deviation? "Adjusting for the mean of the square root of the sum (a little less than √nμ) also gives convergence in distribution to a normal distribution" - why does the anscombe transform not take this mean into account to transform distribution towards normal? I understand that "Poisson random variable can take the value 0 with positive probability" is an issue for log normal transform, but why not just do a z score i.e. subtract mean and divide standard deviation? That would also result in a normal distribution. Would not this achieve what log and anscombe transform does in combination?

$\endgroup$
4
  • $\begingroup$ All Poisson distributions have positive chances of being zero. What is the log of zero? $\endgroup$
    – whuber
    Commented Jul 22, 2021 at 19:47
  • $\begingroup$ @whuber apparently the fashion in cases where $X$ can be zero but not negative is to use $\log(X+1)$ $\endgroup$
    – Henry
    Commented Jul 22, 2021 at 22:48
  • 1
    $\begingroup$ The Anscombe transform does not convert a Poisson distributed variable into one with a normal distribution. Related / relevant: stats.stackexchange.com/questions/46418/… $\endgroup$
    – Glen_b
    Commented Jul 23, 2021 at 2:13
  • 2
    $\begingroup$ @Henry That is problematic. I proposed a better procedure at stats.stackexchange.com/a/30749/919. $\endgroup$
    – whuber
    Commented Jul 23, 2021 at 14:30

1 Answer 1

2
$\begingroup$

Taking the square root of the sum of $n$ iid non-negative random variables with mean $\mu>0$ and variance $\sigma^2>0$ is variance stabilising in general (see a related result) in that the variance of the square root of the sum heads to about $\frac{\sigma}{4\mu}$ as $n$ increases and that limit does not depend on $n$. You cannot say that for the logarithm of the sum. Adjusting for the mean of the square root of the sum (a little less than $\sqrt{n\mu}$) also gives convergence in distribution to a normal distribution as $n$ increases.

A Poisson random variable $X$ with mean and variance $n$ can be seen as the sum of $n$ iid Poisson random variables with mean and variance $1$, so you can apply the previous result to get the conclusion that $\sqrt X$ has a variance heading towards $\frac14$ as $n$ increases. Multiply this by $2$ to get $2\sqrt X$ has a variance heading towards $1$ as n increases. Make a slight adjustment to $2\sqrt{X+\frac38}$ (the Anscombe transform) and the convergence of the variance to $1$ is faster.

Even then a Poisson random variable will be a discrete random variable, and the same will be true of its transform, so you need care comparing its distribution with a normal approximation, especially if the mean is low.

Another issue is that a Poisson random variable can take the value $0$ with positive probability, and that would not give a finite logarithm, so you would probably want to use something like $\log(X+1)$ instead. For a Poisson random variable with mean $n$, the variance of $\log(X+1)$ seems to be close to $\frac1n$ for large $n$, which is not stable as $n$ increases.

But there are other cases where taking logarithms can be variance stabilising. If you have a family of positive random variables where the standard deviation is proportional to the mean (random variables with gamma distributions of fixed shape are examples - including exponential distributions), then taking logarithms can be variance stabilising even if it does not lead to a normal distribution. Poisson random variables do not fit this condition, since it is their variance not standard deviation which is proportional to the mean.

$\endgroup$
6
  • $\begingroup$ Re your remark about "variance of $\log(X+1)$ seems to be close to $1/n$ for large $n$": how do you obtain that? It's not true. $\endgroup$
    – whuber
    Commented Jul 27, 2021 at 22:23
  • $\begingroup$ @whuber For example 1/var(log(rpois(10^6,1234)+1)) seems to be close to $1234$ $\endgroup$
    – Henry
    Commented Jul 27, 2021 at 22:41
  • $\begingroup$ Ah... I had understood from the question that it concerned sampling and therefore $n$ would naturally be the sample size. Although you clearly define it to be the Poisson mean, in this context that is confusing! $\endgroup$
    – whuber
    Commented Jul 29, 2021 at 13:53
  • $\begingroup$ @whuber - my apologies if it was confusing - I had made it $n$ so I could use the argument of the sum of $n$ cases with parameter $1$ $\endgroup$
    – Henry
    Commented Jul 29, 2021 at 14:18
  • $\begingroup$ @Henry - this is great. I have updated the question with some request for further clarification. Please ignore the bottom part of update as I now understand the difference would be that the log anscombe gives fold differences but z-score does not. $\endgroup$
    – StatsBio
    Commented Aug 9, 2021 at 12:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.