I have some data on the frequency that a particular claim is referenced by two distinct types of sources. The first is traditional news media, and the second source are conspiracy theory sites. I want to test the hypothesis that conspiracy sites more frequency cite a given claim than conventional mainstream media sites.
I have data for (a) how many times the claim was cited in 83 news sources, given below
news = [1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1 1 1 7 1 1 1 6 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 5 1 1 4 1 1];
and for how often the same claim was cited in 17 conspiracy theory sources:
ct = [1 2 1 6 2 20 1 1 1 2 9 1 4 2 2 7 7]
I do not a prior have any reason to believe this data will be normally distributed nor am I interested in means, so my understanding is that a non-parametric test ought to be employed. From my initial reading, I think the Wilcoxon Rank Sum test might be appropriate to test this hypothesis, and that I should use a right tailed version because I have a specific direction I wish to test in the hypothesis. Deploying this in MATLAB I get
[p,h,stats] = ranksum(ct,news,'tail','right');
which yields $p = 3.5166 \times 10^{-6}$ and strong rejection of the null that they're from populations with the same median. But is this the right test to employ, or would another non-parametric method be better for analysing data of this sort? I will in future have to run similar analysis, so am open to suggestion or correction!