4
$\begingroup$

Say, I have a Poisson process which was measured $N$ times, and each measurement produced $k_i$ value. Also, $k_i$ are events that I have to detect and my detection probability is $p$. In fact, I detect $\widetilde{k}_i$ which correlates with $k_i$ via $p$.

Knowing, the number of the detected events $\widetilde{k}_i$, the constant probability $p$, and the constant measurement time $T$, how I find the confidence interval for estimation of Poisson's $\lambda$?

$\endgroup$
7
  • $\begingroup$ Is the value of $p$ known with certainty, and is it constant among all conditions of interest? $\endgroup$
    – EdM
    Commented Jul 16, 2020 at 14:55
  • $\begingroup$ The value of $p$ is known and constant among all conditions... $\endgroup$ Commented Jul 16, 2020 at 16:51
  • $\begingroup$ @EdM see my answer above $\endgroup$ Commented Jul 16, 2020 at 19:30
  • $\begingroup$ Are your $N$ measurements all for identical time periods? $\endgroup$
    – soakley
    Commented Jul 16, 2020 at 19:41
  • 1
    $\begingroup$ This is not "poisson process", this is actually poisson-binomial mixture model. $\endgroup$
    – Tomas
    Commented Jul 20, 2020 at 13:25

1 Answer 1

4
+50
$\begingroup$

There are at least 19 ways to estimate a confidence interval (CI) based on samples from a Poisson distribution; see this page and its links for extensive discussion. The question here is what is different when you don't observe the "true" underlying process, with rate $\lambda$ per unit time, but instead have a known, fixed probability $p$ of detecting a true event. The principles below will apply whichever CI method you choose.

TL;DR: what you are sampling from is still a Poisson distribution, but now with rate $p \lambda$ per unit time. You first estimate $p \hat\lambda$ and its CI for the Poisson-distributed events you observed. Then, as $p$ is known and fixed, you correct back to the "true" process by dividing both $p \hat\lambda$ and its CI by $p$.

First, recognize that what matters is the total time over which you have collected counts, as counts over different time periods are independent with a Poisson distribution. Whether you have 100 observations each lasting 1 second or 1 observation lasting 100 seconds doesn't matter. So I'll take your total time of observation to be your $T$ and your total observed counts to be $\tilde k$, just adding together any counts and time periods that might in practice have been observed separately.

Now think about the derivation of the Poisson distribution from the binomial distribution:

one assumes that there exists a small enough subinterval for which the probability of an event occurring twice is "negligible". With this assumption one can derive the Poisson distribution from the Binomial one, given only the information of expected number of total events in the whole interval.

In your case the word "events" is used in two ways: the number of true events, and the number of events that you observe. The expected number of true events during total time $T$ is $\lambda T$. The expected number of observed events over that time, with a known and fixed probability $p$ of detecting a true event, is $p\lambda T$. The observations are still a sample of rare independent events, the essential characterization of a Poisson distribution, just with a lower rate than the true rate. The critical point is that, as the observations are drawn from a Poisson distribution, all the extra variability introduced by the detection probability $p$ is captured in the expected number of observed events, $p\lambda T$.

Let's start with the estimate for the observed rate, $p\hat\lambda$. That is simply $\tilde k/T$. For the CI around that estimate, use as a simple example the formula with a normal approximation and the square root of the variance. With variance equal to the mean in a Poisson distribution, the CI around that estimate would be $\pm 1.96 \sqrt{\tilde k/T}$.

With the assumption that $p$ is known and fixed, you correct both the point estimate and the CI back to the scale of the "true" distribution by dividing each of them by $p$. That is, you have:

$$\hat\lambda = \frac{\tilde k}{pT} \pm 1.96 \frac{1}{p} \sqrt{\frac{\tilde k}{T}}.$$

This same result could also have been derived from the basic properties of the variance in terms of multiplication by the constant $1/p$. Note that the CI is wider by a factor of $\sqrt{1/p}$ than it would have been if you detected all the true events with $p = 1$, as $\tilde k/p$ approaches the true number of events $k$ in your terminology as $p$ approaches 1. For other CI estimate methods the principle is the same: calculate the CI for the observed counts, then divide by $p$.

$\endgroup$
3
  • $\begingroup$ how come the conference interval is not a function of sample size $N$ and variance of sampled values $k_i$? $\endgroup$ Commented Jul 18, 2020 at 7:50
  • $\begingroup$ I can see how $N$ can be plugged in yet I am not sure that the mutual probability density function is also Poisson... $\endgroup$ Commented Jul 18, 2020 at 13:14
  • $\begingroup$ @GideonKogan revised extensively in response to your comments. The trick to keep in mind is that the observed counts with fixed $p$ of observation for each true event still represent rare(r) independent events, the characterization of a Poisson distribution. $\endgroup$
    – EdM
    Commented Jul 18, 2020 at 18:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.