6
$\begingroup$

When using the function hoeffd in the CRAN package Hmisc I get unusual p-values for pairs of data sets that are independent. The function hoeffd is an implementation of Hoeffding's $D$ statistic. Given a pair of independent data sets, the p-value from the $D$ statistic should be a uniform random variable. However, I get a distribution that looks uniform up to $0.4$, then the density spikes and decreases linearly to $1$.

Suppose I run the following code, which generates $10{,}000$ pairs of independent Gaussian data sets (each set of length $100$), calculates the p-value for the independence test on each of them, and then plots a histogram of the p-values:

library(Hmisc)
N = 10000
n = 100
set.seed(n)
pvals = numeric(N)
for (i in 1:N) {
    pvals[i] = hoeffd(rnorm(n), rnorm(n))$P[1, 2]
}
hist(pvals, freq = FALSE, main = paste("P-Values: n = ", n, sep = ""), xlab = "P-Values")

Then the output is Hoeffding's P-Values for a pair of 100 independent Gaussian observations

Running the same simulation, but with each data set in each pair having $1{,}000$ observations:

N = 10000
n = 1000
set.seed(n)
pvals = numeric(N)
for (i in 1:N) {
    pvals[i] = hoeffd(rnorm(n), rnorm(n))$P[1, 2]
}
hist(pvals, freq = FALSE, main = paste("P-Values: n = ", n, sep = ""), xlab = "P-Values")

gives

Hoeffding's P-Values for a pair of 1000 independent Gaussian observations

It takes a long time to compute this with $n = 10{,}000$ but a pattern like this is still visible with a small number of simulations.

If you run the code above but replace

rnorm(n)

with, for example,

rexp(n, 1)

you get essentially the same results.

I'm assuming that this is due to the fact that the p-values are asymptotic approximations of the true p-values, which is stated in the reference manual. But shouldn't the distribution look more uniform than this? Why do the p-values have this distribution?

I am using R version 3.5.2 ("Eggshell Igloo") on my Mac (but I get the same distribution on Windows) and I'm using Hmisc version 4.2-0.

$\endgroup$
1
  • 1
    $\begingroup$ An excellent question and well laid out, although we may not care as much whether a p-value should be 0.7 vs 0.9. I hope you get some responses from those knowledgeable about how the null distribution approximation works with Hoeffding's test. $\endgroup$ Commented Feb 3, 2019 at 13:13

0

Browse other questions tagged or ask your own question.