Test for statistically significant difference: Wilcoxon vs Median Permutation

Question

I have a dataset that shows measured 'perceived quality'(dependent variable), by two different sizes in the context of 4 different intensity levels (200 values for perceived quality in each category, and non-normal distribution). So the data looks like this: [ see the data here ]

           Perceived     Perceived
          quality when  quality when
Intensity  size = X      size = Y
     5       0.76         0.8
     5       0.80         1.1
     5       1.18         1.1
     5       1.18         1.1
    10       0.82         1.0
    10       0.96         1.4
    10       1.00         1.2
    10       0.96         1.2
    12       1.00         1.4
    12       0.96         1.4
    12       1.10         0.2
    12       1.08         0.7
    20       1.10         0.7
    20       1.14         0.8
    20       1.06         0.8
    20       1.16         0.9

I tried the Wilcoxon signed rank test to see if there is a significant difference between the perceived qualities of the two groups (different sizes) and p-value = 0.00049.

Now, I have organized the data by the intensity levels and I wanna see if there is any significant difference between the perceived quality values of different sizes within each intensity level. So I have four sets like this (again, non-normal distribution):

         Intensity=5,   Intensity=5,
          Perceived      Perceived
         quality when   quality when
          size = X       size = Y
             0.76           0.8
             0.80           1.1
             1.18           1.1
             1.18           1.1

I tried Wilcoxon signed rank again, but I got these two errors in R: Warning: cannot compute exact p-value with ties and Warning: cannot compute exact p-value with zeroes

After searching I realized maybe a permutation test is better when there are ties in the data, so I performed a permutation test with median (since the data in each group seemed skewed). My question is can I continue with the permutation test (with median) in the subgroups if I have done a Wilcoxon test for the large sample? Also, I am just not sure if I am on the right path. I tried permutation with the median for the large sample that Wilcoxon has showed the p-value = 0.00049 initially, and I am getting p-value = 1. However, when I change the median to the mean the p-value = 0.00582.

I would appreciate any help. Thanks!

Added: This is the boxplot of the main data that Wilcoxon gives me the p-value = 0.00049 for their difference.

How did you manage to control for intensity when first applying the Wilcoxon signed-rank test? There might also be another issue lurking here: these look like summary data, such as averages of 25 evaluations in one column and just 10 evaluations in the other. Why aren't you using the original data in your analysis, which potentially carry more information? — whuber, Commented Dec 13, 2023 at 18:56
Warnings aren't errors. These warnings are really not that serious, but generally sample size 4 isn't very informative for testing. Another issue is that if there are systematic differences between Intensity values, the Wilcoxon for the "large" sample is problematic because it treats the data as coming from the same distribution, which they aren't. — Christian Hennig, Commented Dec 13, 2023 at 18:57
The smallest possible two tailed permutation-test p-value with 4 pairs (as you have at the end there) is 2/16 = 0.125 (even without ties). With ties it can sometimes be worse. If you're using a 5% or even a 10% significance level with 4 pair-differences you are wasting your time -- your power is exactly 0. If you get any smaller p-value than that ... you did something wrong. — Glen_b, Commented Dec 14, 2023 at 1:15
This is the case (the minimum p-value being 0.125 at best) whether you use means, medians, or a signed rank statistic. — Glen_b, Commented Dec 14, 2023 at 1:42
I don't follow what you mean by two pairs. Your second table shows $4$ rows, each row being one pair. From that, there are $2^4$ possible sets of $4$ signs to assign to pair differences. This yields 16 distinct values for a permutation statistic - at most. — Glen_b, Commented Dec 14, 2023 at 2:36

Christian Hennig · Accepted Answer · 2023-12-14 11:08:44Z

I assume that all your computations are correct, which of course we can't know. I also assume you have 200 observations in an intensity category despite showing just four. I also assume that the permutation tests based on median/mean are run correctly to test the the null hypothesis of zero median or mean of the differences between X and Y.

Firstly, if there are systematic differences between the intensities, the Wilcoxon test for the full data set (all intensities combined) is wrong, as not all data come from the same distributions.

Regarding testing within categories, the tests you discuss do different things. The test based on medians will look for the median of the differences, the Wilcoxon will look for deviations from symmetry of the distributions of differences about median zero (which is stronger than just having median zero), and the test based on means will look for deviations of the mean of the differences from zero. Note that for asymmetric distributions mean and median are different, so the mean and median test will test different null hypotheses.

From the boxplot the distributions seem clearly different but the medians about the same. As the boxplot doesn't show how the values are paired, we can't exactly see the information that is used by these tests, and therefore it is impossible to predict with some confidence (in the literal sense, not in the "confidence interval" sense) what these tests should give.

Still given that medians look the same, I'm not surprised that the test based on the median does not reject. However the distribution of differences may well be asymmetric, prompting the signed Wilcoxon to reject, and the mean to be different from the median. So this may all be correct, and you need to decide what to make of all of this information. (If you want to choose just one test, the question really is what kind of null hypothesis is most relevant for you, which may be a hard question but depends on subject matter background, what the data mean, how the result will be used etc.)

The warnings for the signed Wilcoxon are not much of a problem in my view. Warnings are not errors.

Side note: Tests based on sample medians are not efficient enough for their use to be recommended. — Frank Harrell, Commented Dec 14, 2023 at 13:59

Stack Exchange Network

Test for statistically significant difference: Wilcoxon vs Median Permutation

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
statistical-significance
permutation-test
wilcoxon-signed-rank
or ask your own question.

Hot Network Questions

Test for statistically significant difference: Wilcoxon vs Median Permutation

1 Answer 1

Not the answer you're looking for? Browse other questions tagged statistical-significancepermutation-testwilcoxon-signed-rank or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistical-significance
permutation-test
wilcoxon-signed-rank
or ask your own question.