Why use Negative Binomial distribution to model count data?

Question

According to Wikipedia: https://en.wikipedia.org/wiki/Negative_binomial_distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs.

I see that the Negative Binomial distribution is usually used to model count data, especially in the insurance industry. However, I don't see why it should be used when it models number of success before some failures occur. For example, it is used to model the number of catastrophic events happening in 1 year and I don't see anything to do with "number of success before some failures".

Could you please explain me why we use Negative Binomial distribution to model count data, even when the concept of "number of success before $r$ failures" doesn't exist ?

Thank you very much for your help!

There is more than one formulation of the negative binomial, but in the case you quote and another, it takes values in the non-negative integers, and so can easily represent a count. So too do the geometric and Poisson distributions, but they have some fixed properties on the relationship between the mean and variance while the negative binomial has more flexibility, which may be useful for some modelling — Henry, Commented Jan 23, 2022 at 17:40

WeakLearner · Accepted Answer · 2022-01-23 17:42:10Z

1

The better motivation for the use of the negative binomial for count data is that the negative binomial is a gamma mixture of poisson random variables. To see this, suppose that $y|\lambda \sim \text{Poisson}(\lambda)$, i.e. given a fixed value of $\lambda$, $y$ is poisson distributed. Further, assume that $\lambda \sim \text{Gamma}(a,b)$. Then you can show that the marginal distribution of $y$ has the negative binomial distribution, i.e. by solving:

$$ p(y) = \int p(y|\lambda) p(\lambda) d\lambda $$

The most basic approach to count data is to assume that $y$ is poisson with a fixed $\lambda$. This can however fail to capture certain aspects of count data, since it assums that the mean and the variance of $y$ are equal. Whereas in the negative binomial case we no longer have this issue

answered Jan 23, 2022 at 17:42

WeakLearner

6,1064 gold badges29 silver badges56 bronze badges

$\begingroup$ Hi, thank you very much for your answer. I would like to ask if there is an equivalent between the "success failure" formulation and the Poisson-Gamma mixture of Negative Binomial ? There is an explanation in Wikipedia but I found it not easy to understand, so if you can suggest a formal paper, I am very appreciated!! Thank you for your help! $\endgroup$
– InTheSearchForKnowledge
Commented Jan 23, 2022 at 17:51
$\begingroup$ the wikipedia discussion is not something i've come across before, and to be honest does not seem very useful. I would rather view this approach as a new definition of the negative binomial, and it makes sense in the context of count regression since it generalizes the poisson regression to allow for $\lambda$ to vary $\endgroup$
– WeakLearner
Commented Jan 23, 2022 at 18:22

Add a comment |

Stack Exchange Network

Why use Negative Binomial distribution to model count data?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
statistics
negative-binomial
.

Hot Network Questions

Why use Negative Binomial distribution to model count data?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilitystatisticsnegative-binomial.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
statistics
negative-binomial
.