Bootstrapping unbalanced clustered data (non-parametric bootstrap)

Question

I am trying to figure out how to simulate bootstrap samples from a dataset with unbalanced clusters. The approach I would like to adopt is non-parametric pairs bootstrap, which easily allows to maintain the dependence structure of the clusters.

Suppose for a moment that data were balanced (e.g., 500 mothers, each with 2 children). The two-level simulation algorithm with B iterations would be:

For $b = 1,\dots, B$,

Sample 500 mothers with replacement.
Sample 2 children without replacement.

Hence, both the clusters internal composition is maintained unaltered with respect to the initial sample and the final sample size is equal to the one of the original dataset ($N = 1000$).

Now, suppose that some mothers has 3 children. This implies that by adopting the above strategy the final simulated sample in general will not be composed by 1000 observations.

To your knowledge, are there statistical issues in this second case? If so, how would you proceed?

After having read on Davidson book 1 that the unbalanced clusters case would require more advanced techniques I made an extensive bibliographic research, but I found little or nothing about it in terms of simulation algorithms.

UPDATE

For the actual clustered bootstrap implementation in R, see this question.

1 Davidson, A. C., Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge University Press.

possible duplicate of Bootstrapping hierarchical/multilevel data (resampling clusters) — StasK, Commented Jan 4, 2013 at 17:01

StasK · Accepted Answer · 2013-01-04 16:59:36Z

5

With clustered data, you have 500 degrees of freedom, anyway. It does not matter that your nominal sample size may be 1005 or 1320 or whatever the number will be. The sampling variance of your estimates will generally improve only to the extent that you increase the number of clusters. So I would not see the random sample size as an issue.

I have written cluster bootstrap code in Stata, see http://www.stata-journal.com/article.html?article=st0187.

answered Jan 4, 2013 at 16:59

StasK

32k2 gold badges98 silver badges188 bronze badges

$\begingroup$ to your knowledge has there been any update in code to do this in R? $\endgroup$
– RNB
Commented May 30, 2017 at 8:56
$\begingroup$ You can check this question for the implementation in R $\endgroup$
– Stefano Lombardi
Commented Jun 1, 2017 at 23:05

Add a comment |

Stack Exchange Network

Bootstrapping unbalanced clustered data (non-parametric bootstrap)

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
bootstrap
simulation
panel-data
or ask your own question.

Linked

Hot Network Questions

Bootstrapping unbalanced clustered data (non-parametric bootstrap)

1 Answer 1

Not the answer you're looking for? Browse other questions tagged bootstrapsimulationpanel-data or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
bootstrap
simulation
panel-data
or ask your own question.