Non-uniform p-values from hoeffd function in Hmisc when data sets are independent

Ask Question

Asked 5 years, 5 months ago

Modified 5 years, 5 months ago

Viewed 193 times

When using the function hoeffd in the CRAN package Hmisc I get unusual p-values for pairs of data sets that are independent. The function hoeffd is an implementation of Hoeffding's $D$ statistic. Given a pair of independent data sets, the p-value from the $D$ statistic should be a uniform random variable. However, I get a distribution that looks uniform up to $0.4$, then the density spikes and decreases linearly to $1$.

Suppose I run the following code, which generates $10{,}000$ pairs of independent Gaussian data sets (each set of length $100$), calculates the p-value for the independence test on each of them, and then plots a histogram of the p-values:

library(Hmisc)
N = 10000
n = 100
set.seed(n)
pvals = numeric(N)
for (i in 1:N) {
    pvals[i] = hoeffd(rnorm(n), rnorm(n))$P[1, 2]
}
hist(pvals, freq = FALSE, main = paste("P-Values: n = ", n, sep = ""), xlab = "P-Values")

Then the output is

Running the same simulation, but with each data set in each pair having $1{,}000$ observations:

N = 10000
n = 1000
set.seed(n)
pvals = numeric(N)
for (i in 1:N) {
    pvals[i] = hoeffd(rnorm(n), rnorm(n))$P[1, 2]
}
hist(pvals, freq = FALSE, main = paste("P-Values: n = ", n, sep = ""), xlab = "P-Values")

gives

It takes a long time to compute this with $n = 10{,}000$ but a pattern like this is still visible with a small number of simulations.

If you run the code above but replace

rnorm(n)

with, for example,

rexp(n, 1)

you get essentially the same results.

I'm assuming that this is due to the fact that the p-values are asymptotic approximations of the true p-values, which is stated in the reference manual. But shouldn't the distribution look more uniform than this? Why do the p-values have this distribution?

I am using R version 3.5.2 ("Eggshell Igloo") on my Mac (but I get the same distribution on Windows) and I'm using Hmisc version 4.2-0.

edited Feb 2, 2019 at 13:34

asked Jan 30, 2019 at 1:43

Alex

7224 silver badges7 bronze badges

1

$\begingroup$ An excellent question and well laid out, although we may not care as much whether a p-value should be 0.7 vs 0.9. I hope you get some responses from those knowledgeable about how the null distribution approximation works with Hoeffding's test. $\endgroup$
– Frank Harrell
Commented Feb 3, 2019 at 13:13

Add a comment |

Stack Exchange Network

Non-uniform p-values from hoeffd function in Hmisc when data sets are independent

0

Browse other questions tagged
r
p-value
nonparametric
or ask your own question.

Hot Network Questions

Non-uniform p-values from hoeffd function in Hmisc when data sets are independent

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged rp-valuenonparametric or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
r
p-value
nonparametric
or ask your own question.