Currently, there are more than 10 variants of omicron virus. Say, variant proportions are X1=0.1, X2=0.35, X3=0.15, etc. I need to calculate the number of samples needed to detect the proportion of each of the variants correctly (for a genomics experiment). The sample size formula X = Zα/2p(1-p) / MOE2, is based on normal approximation to the binomial, so yes/no event. Indeed, each proportion can be calculated based on expected value and the maximum sample size to be chosen out of all calculated, but I am looking into other ways of calculation of sample size, which would take into account 'multidimensionality'.
So the questions are:
- how to correctly calculate the sample size required for a vector of proportions to be estimated simultaneously ?
- I am also interested in prediction of the proportions at the next lag (proportions are calculated weekly), so at week 10 I want to predict vector (x1_week11, x2_week11,...x10_week11). Again, given we have 10 proportions, what could be the methods to predict the vector?
- I need to do simulation to show that the approach for calculation of sample size works, based on current and/or historic data. What are the approaches, again, in multidimensional case, (I assume for a single proportion a simulation from beta distribution could work).