1
$\begingroup$

Short Background on Permutation Testing

Suppose I have two sets of samples $P$ and $Q$ drawn iid from distributions $\mathcal{P}$ and $\mathcal{Q}$ over $X$.

I also have access to a test function $T: X^n\times X^n\to \mathbb{R}$ that can take two sets of samples of size $n$ and output a real number. Say we have designed $T$ in a way that we expect it to be lower when $\mathcal{P}\neq \mathcal{Q}$ and higher otherwise, in other words $T$ is just a heuristic such as MMD with a simple kernel.

To turn $T$ into a two-sample test with a significance $\alpha$ I can employ a permutation test to find the $\alpha$ quantile of the distribution of $T$ under the null hypothesis $\mathcal{P} = \mathcal{Q}$. We do this by combining $P$ and $Q$ permuting them in many orders then diving them again into two sets of size $n$ and estimate the quantity "$\tau=\text{quantile}(T_{\text{null}},\alpha)$" $T_{\text{null}}$ is the set of $T$ values obtained by running $T$ over all permutations. We can then use $\tau$ as the threshold on $T$ for which we deem $T(P,Q)$ to be significant.

Question

My question is that when we are given more samples in $P$ compared to $Q$ (i.e., $|P|>>|Q|=n$) but still only a test function $T: X^n\times X^n\to \mathbb{R}$ can we compute the quantile of test function $\text{quantile}(T_{\text{null}},\alpha)$ only on samples from $P$ e.g, computing $T$ on several mutually exclusive random draws on $n$ samples from $P$.

To run the actual test we take one random sample $P_n:=\text{RandomSample}(P, n)$ check if $T(P_n,Q) \leq \text{quantile}(T_{\text{null}},\alpha)$.

This approach seems reasonable to me and shares a high resemblance to the idea of conformal prediction. However, I have not seen it referenced in the area of two-sample testing so I'm seeking some advice before building it in a more involved research method.

Thank you

$\endgroup$

0

Browse other questions tagged or ask your own question.