Sampling distribution of the proportion of events

Ask Question

Asked 13 days ago

Modified 13 days ago

Viewed 27 times

For categorical variables with $l \ge 2$ categories, what is the sampling distribution of the proportion of events in each category? These are obviously not independent, since they add up to 1.

Does it matter if these variables are ordinal?

For binary variables, the well-known result is that the proportion of events has a normal distribution with mean $p$ and variance $p(1-p)/n$.

With an arbitrary number of categories, is the result a generalization of this?

I've seen the multivariate normal mentioned, with a particular covariance matrix. But I am not sure how that works, since the draws are not guaranteed to sum up to 1.
I know the Dirichlet distribution is used for this in Bayesian estimation. Ideally, for my application, I am looking for a frequentist solution, just to keep things simple.
A "natural" solution could be to draw from the multinomial (with the probability parameter set to the sample proportions) and then divide by the number of trials. I've not seen this mentioned anywhere. Does this have a name? If this was indeed the solution, why isn't it the solution for $l = 2$ categories (where we would use the binomial)?

A good reference is a plus. A solution that someone has already implemented in R is also a plus.

edited Jul 7 at 23:04

asked Jul 7 at 22:56

Jessica

1,2518 silver badges21 bronze badges

1

$\begingroup$ For binary variables, I would have thought the proportion of one of the categories had a binomial distribution scaled by $\frac1n$ rather a normal distribution. So with more categories, I would have thought you had a multinomial distribution scaled again scaled by $\frac1n$. $\endgroup$
– Henry
Commented Jul 7 at 23:15
1

$\begingroup$ If certain assumptions hold (that observations arise from a Bernoulli process), then you'd have a scaled binomial (which is approximately normal in sufficiently large samples). Under analogous assumptions, this would be scaled multinomial, which also has an asymptotic normal distribution (it's degenerate because of the sum-to-one thing; it lives on a hyperplane of dimension $k-1$ for $k$ categories). The mean and variance of each term is as for the scaled binomial, and the covariances are $-p_i p_j/n$... ctd $\endgroup$
– Glen_b
Commented Jul 7 at 23:25
2

$\begingroup$ I think @Henry was very kind in his comment. The well-known result for $l=2$ (binary variables) is the binomial distribution, which as n gets larger and $p$ is neither too close to 0 or 1, can start approximating a gaussian (but why use an approximation when one can use the true binomial distribution?). For $l \ge 3$, this is the multinomial distribution. Variables can be categorical or ordinal (e.g. 6-sided die). Wikipedia has good descriptions of both distributions (bi & multi-nomial). $\endgroup$
– jginestet
Commented Jul 7 at 23:26
1

$\begingroup$ ctd ... I don't know of any reference that directly discusses the scaled multinomial in detail - it would be a waste of time $-$ it's just a linear rescaling of the multinomial; any work with it would be a simple undergrad exercise. Some references discuss the multivariate normal approximation to either the multinomial or the scaled multinomial, including Pearson's original 1900 paper on the chi-squared test for multinomial goodness of fit. Also see the refs here: stats.stackexchange.com/questions/34547/… $\endgroup$
– Glen_b
Commented Jul 7 at 23:39
$\begingroup$ Thank you all. I think the correct answer is the binomial / multinomial rescaled by $1/n$. I just can't find anything that explicitly says so. I am also confused by why people bring up the normal approximation. If we have the actual distribution, what's the point of the approximation. $\endgroup$
– Jessica
Commented Jul 8 at 14:57

| Show 1 more comment

Stack Exchange Network

Sampling distribution of the proportion of events

0

Browse other questions tagged
r
distributions
estimation
or ask your own question.

Linked

Hot Network Questions

Sampling distribution of the proportion of events

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged rdistributionsestimation or ask your own question.

Linked

Related

Hot Network Questions

Browse other questions tagged
r
distributions
estimation
or ask your own question.