1
$\begingroup$

What type of non-parametric test would be suitable for the dataset shown in the image: description

The sample size is 200 each for both variables. The data value can vary from 0-4 for both variables. The histogram or Q-Q plot doesn't show any normalcy. Hence, I also tried to use Mann-Whitney U test where I defined both variables as ordinal. However, SPSS is unable to compute any test statistics. I am looking for some recommendation.

$\endgroup$
5
  • 7
    $\begingroup$ What is the question you are hoping to answer from the data? $\endgroup$
    – Galen
    Commented Sep 23, 2023 at 4:36
  • 5
    $\begingroup$ The data are not the only thing when it comes to choosing a test; it depends on other things. What is the precise question of interest / what null hypothesis and alternative you want to test? Are these Likert type items? Did you want to treat them as ordinal or did you have it in mind to treat them as interval? Are the values actually paired? $\endgroup$
    – Glen_b
    Commented Sep 23, 2023 at 6:21
  • 2
    $\begingroup$ And what do the rows represent? $\endgroup$ Commented Sep 23, 2023 at 12:07
  • $\begingroup$ Ideally, I would like to see if there are statistically significant difference between the mean of the two variables. Yes, these are Likert type items. The null hypothesis would be there is no statistical difference between the two variables. The values are independent, and I would like to treat them as ordinal. The two variables are coming from independent population and each participant selected a value from 0-4. $\endgroup$
    – scouse_s
    Commented Sep 23, 2023 at 15:04
  • $\begingroup$ And by chance, the two independent samples are of equal size? $\endgroup$
    – Michael M
    Commented Sep 24, 2023 at 14:28

2 Answers 2

5
$\begingroup$

As the data are responses to a Likert-type item with four response options, it's best to treat the data as ordinal. It probably doesn't even make sense to think of the mean of such a variable. That is, you can't take an average, of, say, responses that are "Never", "Rarely", "Frequently", "All the time".

A simple approach would be the Wilcoxon-Mann-Whitney test mentioned in the question. Note that this test is not usually a test of the mean or median, but tests if the responses in one group tend to be higher than in the other group.

You could also use a test of the median, like Mood's median test. But I'll warn you that this test can get funky when you have discrete responses, like you have.

As to why you aren't getting results from SPSS, I wouldn't know. But the test is easy enough in Jamovi or even R or Python.

ADDITION:

You can run the following in R or at https://rdrr.io/snippets/ without installing software.

At least for the data you supplied --- if I got the numbers correct --- here's what I get.

For these samples, the item "p" tends to have higher values than "g" --- more "4"'s, and fewer "0"'s and "2"'s,

Item1_g = c(3,4,4,3,3,4,3,3,4,4,3,4,4,3,4,3,4,0,4,4,4,3,3,4,4,4,3,3,3,3,2,3,2)
Item1_p = c(3,3,3,4,4,3,4,3,3,4,4,3,3,4,4,4,3,3,4,4,4,4,4,4,4,4,4,3,4,3,4,3,3)

wilcox.test(Item1_g, Item1_p)

   ### Wilcoxon rank sum test with continuity correction
   ### W = 457.5, p-value = 0.2087

library(rcompanion)

vda(x=Item1_g, y=Item1_p, verbose=TRUE)

   ###            Statistic Value
   ### 1 Proportion Ya > Yb 0.193
   ### 2 Proportion Ya < Yb 0.353
   ### 3    Proportion ties 0.455

Data = data.frame(Y = factor(c(Item1_g, Item1_p), 
                             levels=c("0","1","2","3","4")), 
                  Group = c(rep("Item1_g", length(Item1_g)),
                            rep("Item1_p", length(Item1_p))))

round(prop.table(xtabs(~ Group + Y, data=Data), margin=1),2)

   ###            Y
   ###  Group      0    1    2    3    4
   ###  Item1_g 0.03 0.00 0.06 0.45 0.45
   ###  Item1_p 0.00 0.00 0.00 0.42 0.58

This additional code will create a bar plots to compare the two groups.

library(lattice)

histogram(~ Y | Group,
          data=Data,
          layout=c(1,2),
          drop.unused.levels=FALSE)

enter image description here

$\endgroup$
9
  • 2
    $\begingroup$ Thank you, this answer helps validate I am on the right track. I will try out in python. $\endgroup$
    – scouse_s
    Commented Sep 24, 2023 at 15:02
  • $\begingroup$ I made an addition to the post. It's helpful for this kind of data to look at the proportion of each response in each group, either by bar plot or a table of proportions. $\endgroup$ Commented Sep 24, 2023 at 15:04
  • $\begingroup$ Thank you so much! I really appreciate it. $\endgroup$
    – scouse_s
    Commented Sep 24, 2023 at 15:13
  • 2
    $\begingroup$ If the data are paired, the rank difference test is in order, and it handles ties better than the Wilcoxon signed-rank test. See this and a section below that on how to use the proportional odds model for paired data. This allows for extreme ties in the data. $\endgroup$ Commented Sep 24, 2023 at 15:31
  • 1
    $\begingroup$ Thank you so much! Your answers have been very helpful. $\endgroup$
    – scouse_s
    Commented Sep 26, 2023 at 3:21
1
$\begingroup$

I would not run any parametric (e.g. t-test, because data is ordinal - Likert), or non-parametric test (e.g. Mann-Whitney U, because there are way too many ties: the vast majority of the comparisons result in ties). I would however run a 5x5 contingency table (Chi-2 test of association). This is exactly your situation: categorical variables, and trying to see if the various proportions of the 5 possible outcomes are different; your sample size (100) is also large enough that it should work. However, you have some 0's in that 5x5 table: Chi-2 will not even compute, so you will need to apply Fisher exact to the study of the table

$\endgroup$
2
  • $\begingroup$ For ordinal data with many ties, classic rank based tests like Wilcoxon-Mann-Whitney and Kruskal-Wallis work reasonably well. You can compare the results of these tests to ordinal regression with simulation studies. I have a few results of such simulations here: rcompanion.org/handbook/E_01.html $\endgroup$ Commented Feb 24 at 13:18
  • $\begingroup$ The real question here is if they want to treat the response variable as nominal categorical or ordinal categorical. Usually, if data can be treated as ordinal, it's better to treat it as ordinal. But not always. $\endgroup$ Commented Feb 24 at 13:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.