I am trying to figure out how to simulate bootstrap samples from a dataset with unbalanced clusters. The approach I would like to adopt is non-parametric pairs bootstrap, which easily allows to maintain the dependence structure of the clusters.
Suppose for a moment that data were balanced (e.g., 500 mothers, each with 2 children). The two-level simulation algorithm with B
iterations would be:
For $b = 1,\dots, B$,
- Sample 500 mothers with replacement.
- Sample 2 children without replacement.
Hence, both the clusters internal composition is maintained unaltered with respect to the initial sample and the final sample size is equal to the one of the original dataset ($N = 1000$).
Now, suppose that some mothers has 3 children. This implies that by adopting the above strategy the final simulated sample in general will not be composed by 1000 observations.
To your knowledge, are there statistical issues in this second case? If so, how would you proceed?
After having read on Davidson book 1 that the unbalanced clusters case would require more advanced techniques I made an extensive bibliographic research, but I found little or nothing about it in terms of simulation algorithms.
UPDATE
For the actual clustered bootstrap implementation in R, see this question.
1 Davidson, A. C., Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge University Press.