0
$\begingroup$

I have been trying to find a simple way to use the bootstrap for a hypothesis test that involves more than two samples. The motivation for using the bootstrap is for the usual reasons: the test statistic is complicated; we don’t want to make parametric assumptions. One method that I think would work for my purpose is stated on bootstrapping and is called the basic bootstrap and cites the textbook Bootstrap methods and their application (Davison and Hinkley 1997, equ. 5.6 p. 194).

Problem formulation: We have 16 independent observations

$$ \{ x_i \}_{i = 1,}^{16} $$

where $\{ x_1, x_2, x_3, x_4 \}$, $\{ x_5, x_6, x_7, x_8 \}$, $\{ x_9, x_{10}, x_{11}, x_{12} \}$, $\{ x_{13}, x_{14}, x_{15}, x_{16} \}$ are four random samples drawn from four different populations. We denote the means of the respective populations as $\mu_1, \mu_2, \mu_3, \mu_4$. I want to test

$$ H_0: (\mu_1 - \mu_2) - (\mu_3 - \mu_4) = 0 \\ H_1: (\mu_1 - \mu_2) - (\mu_3 - \mu_4) \neq 0 $$

The test statistic is

$$ t = \left(\frac{x_1 + x_2 + x_3 + x_4}{4} - \frac{x_5 + x_6 + x_7 + x_8}{4}\right) - \left(\frac{x_9 + x_{10} + x_{11} + x_{12}}{4} - \frac{x_{13} + x_{14} + x_{15} + x_{16}}{4}\right) $$

Bootstrap: I resample each of the 4 sets independently. That is, use functions $\sigma : \{ 1, \dots, 16 \} \to \{ 1, \dots, 16 \}$ such that

$$ \sigma(\{1, 2, 3, 4\}) \subseteq \{1, 2, 3, 4\} \\ \sigma(\{5, 6, 7, 8\}) \subseteq \{5, 6, 7, 8\} \\ \sigma(\{9, 10, 11, 12\}) \subseteq \{9, 10, 11, 12\} \\ \sigma(\{13, 14, 15, 16\}) \subseteq \{13, 14, 15, 16\} $$

Then the resampled statistic would we

$$ t^* = \left(\frac{x_{\sigma(1)} + x_{\sigma(2)} + x_{\sigma(3)} + x_{\sigma(4)}}{4} - \frac{x_{\sigma(5)} + x_{\sigma(6)} + x_{\sigma(7)} + x_{\sigma(8)}}{4}\right) - \left(\frac{x_{\sigma(9)} + x_{\sigma(10)} + x_{\sigma(11)} + x_{\sigma(12)}}{4} - \frac{x_{\sigma(13)} + x_{\sigma(14)} + x_{\sigma(15)} + x_{\sigma(16)}}{4}\right) $$

Assume we have $N = 999$ resamples $t^*_i$ with order statistics $t^*_{(i)}$. Then using the basic bootstrap method, we would have the $95\%$ (or $\alpha = 0.05$) two-sided confidence interval

$$ \left[ 2 t - t^*_{((N+1)(1-\alpha/2))}, 2 t - t^*_{((N+1)(\alpha/2))} \right] = \left[ 2 t - t^*_{(975)}, 2 t - t^*_{(25)} \right] $$

or one-sided confidence intervals

$$ \left[ 2 t - t^*_{((N+1)(1-\alpha))},\infty\right) = \left[ 2 t - t^*_{(950)}, \infty \right) \\ \left(\infty, 2 t - t^*_{((N+1)(\alpha))}\right] = \left(\infty, 2 t - t^*_{(50))}\right] $$

Thus, we can reject $H_0$ at 5% significance if $0$ is not in this interval. Although not stated in the reference above, I believe we can also use this process to determine a one-sided P-values for the test statistic $t$ using (respectively)

$$ p = \frac{1 + \sum_{i=1}^{999} \mathbf{1}\{ 2t - t^*_i \geq 0 \}}{1000} \\ p = \frac{1 + \sum_{i=1}^{999} \mathbf{1}\{ 2t - t^*_i \leq 0 \}}{1000} $$

or a two-sided P-value

$$ p = 2 \left(\frac{1 + \min\left(\sum_{i=1}^{999} \mathbf{1}\{ 2 t - t^*_i \geq 0 \}, \sum_{i=1}^{999} \mathbf{1}\{ 2 t - t^*_i \geq 0 \}\right)}{1000}\right) $$

(Note: the last value above can be $> 1$, in which case I would set it to $1$).

Question: Does the above procedure for determining the confidence intervals and P-values seem correct, even though it uses a difference of four means instead of the usual two shown in most examples?

$\endgroup$
11
  • $\begingroup$ In you're setup, you're thinking about four different populations characterized by 4 random variables: $X_1$, $X_2$, $X_3$, $X_4$. You could instead think about 2 different populations $Z_1$ and $Z_2$, where $Z_1 = X_1 - X_2$ and $Z_2 = X_3 - X_4$. Then you can use the bootstrap to estimate the estimate the sampling distribution $\bar Z_1$, $\bar Z_2$. On the surface, this looks to be the same as what you're doing, but it may be easier to reason about $\endgroup$
    – BenA
    Commented Mar 5, 2023 at 16:56
  • $\begingroup$ Have you tried running a simulation to see if the test statistic is well-behaved and meets the assumptions of a bootstrap analysis for a p-value? $\endgroup$
    – David B
    Commented Mar 5, 2023 at 21:14
  • $\begingroup$ With such a small sample I'm curious why you'd use a large-sample method like the bootstrap when there's another form of resampling test, permutation tests, which (as long as you have a suitable exchangeable quantity under $H_0$) should be small-sample exact. $\endgroup$
    – Glen_b
    Commented Mar 5, 2023 at 21:46
  • $\begingroup$ @BenA That is an interesting suggestion, though as you say I don't know if it would be any different computationally than what is being done here. As far as the reasoning, it is true it may be similar to the usual applictaion with two-sample tests. However, I am also interested in more general cases, such as an arbitrary function $f(X_1, X_2, X_3, X_4)$. $\endgroup$ Commented Mar 6, 2023 at 15:23
  • 1
    $\begingroup$ @DavidB The permutation test I described above tests $H_0: \mu_1 - \mu_2 = 0 \text{ and } \mu_3 - \mu_4 = 0$, which is a stronger assumption that $H_0: (\mu_1 - \mu_2) - (\mu_3 - \mu_4) = 0$. I don't know if this null hypothesis can be easily tested with a permutation test. $\endgroup$ Commented Mar 6, 2023 at 18:17

1 Answer 1

0
$\begingroup$

It appears that the situation described above has been addressed in Efron and Tibshirani's book, "An Introduction to the Bootstrap" (1994), specifically in Chapter 8 on more complex data structures. The authors consider the two-sample problem as a more complicated data structure than the one-sample problem. In the case of two samples, they recommend constructing each bootstrap replication by independently resampling from each of the two samples and then recomputing the bootstrap test statistic. They further suggest that more complex data structures can be handled by ensuring that the bootstrapping procedure mimics how the original data was generated and that the test statistic is computed from the bootstrap resamples in the same way as the original estimate.

In the above case, the four-sample case is a straightforward extension of the one- and two-sample cases, and therefore the procedure outlined above should be valid. As for computing P-values, one approach is to invert the confidence-interval (CI) construction. Specifically, the P-value can be obtained by finding the smallest $\alpha$ such that the $1-\alpha$ CI does not contain the null hypothesis value (0 in the above case). This should be equivalent to the P-value formulas given above.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.