6
$\begingroup$

I would like to perform a signed-rank test on two paired, ordinal variables in R. Such a non-parametric test is required, as neither of the variables follow normal distribution. Both variables originate from a survey where respondents have to account for a situation 'before' (variable 1) and a situation 'after' (variable 2).

The sample is weighted: a variable assigns a given weight to each row. The weights are numbers with decimals. Respondents are users of different services and the weights ensure that the proportion of users of each service is the same as in a previous survey, so both surveys can be compared.

The built-in wilcox.test argument in R does not take weights into account. The 'survey' package does offer a Wilcoxon test for weighted data but it does not seem to work for paired variables.

Is there any possibility to perform a weighted Wilcoxon signed-rank test - or any alternative test with a similar function taking weights into account - in R?

As I have not found a way to perform a signed-rank test on a weighted sample (yet), I have considered and rejected solutions to get past the weighting and use the built-in wilcox.test argument:

  • replicating rows with respect to the values of the weighing factors, but it seems uneasy as the latter are not integers;

  • transforming the variables into variables with a normal distribution, but it doesn't seem wise to apply such a treatment to ordinal variables.

If there is no possibility of a weighted paired Wilcoxon test, I could consider weighing the ranks using the wtd.rank argument from the ‘Hmisc’ package: package manual says weighted ranks can be used to obtain Wilcoxon tests. Yet, is it the same thing to attribute a weight to each respondent and to weight the answers directly, i.e. change the values?

Here is some example data formatted as in the sample I am using:

ord.before = c(4, 1, 1, 2, 3, 6, 5, 7, 6, 1) #ordinal “before” variable with 7 levels
ord.after = c(2, 1, 1, 2, 5, 2, 1, 5, 3, 1) #ordinal “after” variable with 7 levels
w = c(1.3, 1.3, 0.7, 0.5, 1.5, 1.6, 1.6, 0.4, 0.4, 0.7) #weights
df = data.frame(ord.before, ord.after, w)

Thanks a million for your help!

Gautier

$\endgroup$
5
  • $\begingroup$ Could you explain what the weights mean? What, specifically, do they measure? $\endgroup$
    – whuber
    Commented Jun 29, 2016 at 21:23
  • $\begingroup$ @whuber sure, I have edited the question. $\endgroup$ Commented Jun 29, 2016 at 21:28
  • 1
    $\begingroup$ It's not quite clear to me the specific intent here: "I am afraid such a method will hurt data quality and cause lower test relevance" -- 1. what aspect of the data are you concerned about and what it is it about your weights that would harm it? 2. in what sense do you mean 'test relevance', and what it is it about your weights that would harm it? .... (aren't the weights designed to make the test more relevant rather than less?) $\endgroup$
    – Glen_b
    Commented Jun 29, 2016 at 21:32
  • $\begingroup$ @Glen_b Thanks, I realise this point is still more a question than an assertion to me. I have edited the question. My point is: if weighting respondents and weighting values have different effects, tests might provide different results according to the method used. Therefore, I would like to find out whereas weighting values instead of respondents implies biases, and which ones. $\endgroup$ Commented Jun 29, 2016 at 21:48
  • 1
    $\begingroup$ Finally, did you find an statistical test to solve your problem? I'm trying to perform now the same task (paired Wilcoxon test, with weights), but I don't know what type of statistical test to apply (and if there is a package to do such thing). Any advice is welcome. $\endgroup$ Commented Jun 23, 2017 at 11:05

1 Answer 1

1
$\begingroup$

"replicating rows with respect to the values of the weighing factors, but it seems uneasy as the latter are not integers;"

This might be worth trying a little more, but maybe I'm missing something. If you have 100 weighted observations, you could try randomly sampling one observation at a time (with replacement? hmm... not sure what the implications are off the top of my head), and then draw a uniform random number x between 0 and 1. If weights are between 0 and 1, then add the observation to the final set if x < this_observations_weight. That way, an observation with .3 weight should have 3X the probability of being selected as one with a .1 weight.

Keep doing that until you get 100 final observations (i.e. you may have to draw n rows and n uniforms, n > 100, to get your full set). You could then plot a distribution of the Wilcoxon stat or take a mean or something like that... Hope I didn't misunderstand your attempt!

$\endgroup$
1
  • $\begingroup$ Thanks a lot for this suggestion, I think it is a good way to approach the results I could have with the weighted database. I'll adopt it unless somebody comes up with a way to perform a signed-rank test on a weighted sample. $\endgroup$ Commented Jun 30, 2016 at 14:05

Not the answer you're looking for? Browse other questions tagged or ask your own question.