0
$\begingroup$

Currently, there are more than 10 variants of omicron virus. Say, variant proportions are X1=0.1, X2=0.35, X3=0.15, etc. I need to calculate the number of samples needed to detect the proportion of each of the variants correctly (for a genomics experiment). The sample size formula X = Zα/2­p(1-p) / MOE2, is based on normal approximation to the binomial, so yes/no event. Indeed, each proportion can be calculated based on expected value and the maximum sample size to be chosen out of all calculated, but I am looking into other ways of calculation of sample size, which would take into account 'multidimensionality'.

So the questions are:

  1. how to correctly calculate the sample size required for a vector of proportions to be estimated simultaneously ?
  2. I am also interested in prediction of the proportions at the next lag (proportions are calculated weekly), so at week 10 I want to predict vector (x1_week11, x2_week11,...x10_week11). Again, given we have 10 proportions, what could be the methods to predict the vector?
  3. I need to do simulation to show that the approach for calculation of sample size works, based on current and/or historic data. What are the approaches, again, in multidimensional case, (I assume for a single proportion a simulation from beta distribution could work).
$\endgroup$
2
  • 1
    $\begingroup$ The proportions can be 'estimated' at essentially any sample size. It depends on 'how close' you want the estimates to be. You're going to need to specify some criterion for closeness of the estimate-vector to the 'true' (population) vector. There's many ways you might specify accuracy. Most analyses focus on a sort of confidence bound on the absolute-error of each component. There's no reason you need that. You might bound the relative error of the worst-case component for example or you might have some combined/ overall measure of inaccuracy. Among other possibilities. This is up to you. $\endgroup$
    – Glen_b
    Commented May 28 at 18:18
  • 1
    $\begingroup$ However, whatever you choose, the thing you're estimating is a constantly moving target and your data are taken across time that it's changing in, so the notion of it being a single value we can accurately estimate is something of an illusion. Even if you have weekly estimates, it wasn't constant over that week. $\endgroup$
    – Glen_b
    Commented May 28 at 18:19

0