For categorical variables with $l \ge 2$ categories, what is the sampling distribution of the proportion of events in each category? These are obviously not independent, since they add up to 1.
Does it matter if these variables are ordinal?
For binary variables, the well-known result is that the proportion of events has a normal distribution with mean $p$ and variance $p(1-p)/n$.
With an arbitrary number of categories, is the result a generalization of this?
- I've seen the multivariate normal mentioned, with a particular covariance matrix. But I am not sure how that works, since the draws are not guaranteed to sum up to 1.
- I know the Dirichlet distribution is used for this in Bayesian estimation. Ideally, for my application, I am looking for a frequentist solution, just to keep things simple.
- A "natural" solution could be to draw from the multinomial (with the probability parameter set to the sample proportions) and then divide by the number of trials. I've not seen this mentioned anywhere. Does this have a name? If this was indeed the solution, why isn't it the solution for $l = 2$ categories (where we would use the binomial)?
A good reference is a plus. A solution that someone has already implemented in R is also a plus.