As a result of the Poisson probability model, the variance and the mean of that distribution are the same. This fact entails that whenever this condition is not met, there will be a mismatch between the confidence intervals suggested by the model and what the data actually shows.
When characterizing count data, the term "overdispersed" comes when the data has a higher dispersion (or in other words, a higher variance) as compared to what would be expected under a Poisson model.
To see what this overdispersion looks like, check out this small R snippet:
library(tidyverse)
set.seed(4242)
df <-
tibble(`poisson (mean 8)` = rpois(600, 8),
`neg_binomial (mean 8)`= rnbinom(600, mu=8, size=4),
) %>%
gather(dist, val)
df %>%
ggplot(aes(val)) +
geom_histogram(fill='skyblue', color='black',
binwidth = 1 ) +
facet_wrap(~dist, ncol=1) +
theme_minimal()
![](https://cdn.statically.io/img/i.sstatic.net/BVGLm.png)
df %>% group_by(dist) %>%
summarise(mean = mean(val), variance = var(val))
#> # A tibble: 2 x 3
#> dist mean variance
#> <chr> <dbl> <dbl>
#> 1 neg_binomial (mean 8) 7.98 23.6
#> 2 poisson (mean 8) 8.02 8.97
Created on 2021-09-20 by the reprex package (v2.0.1)
As you can see, compared to the negative binomial, the poison has much lower variance, even though both have the same mean. This capacity of the negative binomial, the capacity of adjusting to higher dispersion count data, makes it much preferable in situations of overdispersion. That is why negative binomial regression is a thing.
Now, in order to see the effect of overdispersion in the standard errors of the parameters, let's simulate a negative binomial regression with a binary variable is a "control" variable that shifts the mean by 3 units, starting from a baseline of 8.
set.seed(123)
df <-
tibble(x = rbinom(600, size=1, p=.5),
y = rnbinom(600, mu=8 + if_else(x==1, 3, 0), size=4),
)
df %>%
mutate(variable = if_else(x==1, 'x = 1', 'x = 0')) %>%
ggplot(aes(y)) +
geom_histogram(fill='skyblue', color='black',
binwidth = 1 ) +
facet_wrap(~variable, ncol=1) +
theme_minimal()
![](https://cdn.statically.io/img/i.sstatic.net/rj3Y8.png)
And as we can see by the mean of each group, indeed when x = 1 the mean is greater (so is the variance as shown by the histograms, which is a characteristic of count data)
df %>%
mutate(variable = if_else(x==1, 'x = 1', 'x = 0')) %>%
group_by(variable) %>%
summarise(mean = mean(y))
#> # A tibble: 2 x 2
#> variable mean
#> <chr> <dbl>
#> 1 x = 0 8.34
#> 2 x = 1 11.0
In order to show the underestimation of standard errors, lets model this negative binomial data with a Poisson glm, and afterward, with the correct model, a negative binomial
library(tidyverse)
set.seed(123)
df <-
tibble(x = rbinom(600, size=1, p=.5),
y = rnbinom(600, mu=8 + if_else(x==1, 3, 0), size=4),
)
glm(y~x, family = 'poisson', data = df) %>%
summary()
#>
#> Call:
#> glm(formula = y ~ x, family = "poisson", data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -4.6970 -1.6597 -0.3155 0.8781 6.7168
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.12091 0.01967 107.83 <2e-16 ***
#> x 0.27980 0.02645 10.58 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 2157.7 on 599 degrees of freedom
#> Residual deviance: 2045.0 on 598 degrees of freedom
#> AIC: 4395.9
#>
#> Number of Fisher Scoring iterations: 5
MASS::glm.nb(y~x, data = df) %>% summary
#>
#> Call:
#> MASS::glm.nb(formula = y ~ x, data = df, init.theta = 3.89401969,
#> link = log)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -3.2348 -0.9097 -0.1631 0.4797 2.8902
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.12091 0.03486 60.840 < 2e-16 ***
#> x 0.27980 0.04913 5.696 1.23e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for Negative Binomial(3.894) family taken to be 1)
#>
#> Null deviance: 657.73 on 599 degrees of freedom
#> Residual deviance: 625.25 on 598 degrees of freedom
#> AIC: 3686.3
#>
#> Number of Fisher Scoring iterations: 1
#>
#>
#> Theta: 3.894
#> Std. Err.: 0.319
#>
#> 2 x log-likelihood: -3680.255
As you can see, the estimates of the parameters of both models agree. Where they don't agree is in the std error, since the Poisson model reports a lower estimate, and based on the simulation we know that it's wrong. And we know that a Poisson distribution can't handle higher values of variance (or dispersion, in the language of distributions belonging to the exponential family, used in glms).
Usually, this underestimation of standard errors comes from the notion of "if we have an actual overdispersion in our data generating process, then Poisson regression will underestimate it." One of the distributions that we know that has overdispersion is the Negative Binomial, and we shall find this overestimation.
First of all, the parametrization for the negative binomial usually used for negative binomial regression arises from the gamma-Poisson mixture interpretation, and you can learn more about it here. Under this parametrization, if $Y_i \sim NB(\mu_i, \phi)$ then:
$$
\text{E}(Y_i) = \mu_i \\
\text{Var}(Y_i) = \mu_i + \frac{\mu_i^2}{\phi}
$$
Comparing that if the Poisson case, if $Y_i \sim \text{Poisson}(\mu_i)$, we have:
$$
\text{E}(Y_i) = \mu_i \\
\text{Var}(Y_i) = \mu_i
$$
This should start giving you hints of why the estimates would be different.