3
$\begingroup$

I am wanting to compare two independent groups on a likert-like item. To explain, the dependent variable is structured so that a 1 = <1 units, 2 = 1-<2, 3 = 2-<3, all the way up to option 7 = >6. Initially, I was planning to use a t-test to compare the two groups as I have read that with seven items, even a likert scale item can be used as interval data. I have a single outlier on this dependent variable. While I have gone through the data and have not found anything that would suggest the response is inaccurate, it is very influential to my distribution. It brings the Skew from .792 to 1.2, above the rule of thumb for skew < |1| and above the CI cutoff. I would look to a Mann-Whitney U, but have found that this one outlier also significantly impacts the distribution of one group, creating unequal distributions. From what I have read, this eliminates the Mann-Whitney U as an option. Since it is a single data point that is creating this problem, I would love to be able to simply remove it (I have a large enough sample size that losing one participant won't make much of an impact), but I do not want to bias my results by doing so. I am seriously stuck on where to go with this analysis!

Additional Information:

After running the t-test with and without the outlier it results in a p-value of .047 with, and .014 without. While both are significant at a level of .05, I am also running t-tests with two other dependent variables (not correlated so multivariate methods were not warranted). So after a Bonferroni correction, only values < .016 can be accepted as significant. This makes the t-test with the outlier non-significant and without the outlier significant.

Out of 179 responses, only one had a value of 7 and the next lowest option was 5. Most responses were between 1-4 so the 7 has pulled the distribution to the right.

$\endgroup$
3
  • 1
    $\begingroup$ To what extent do your results change when you remove the outlier? If the changes don't really matter, you may make a judgment call concerning what to do (remove or not) and report both sets of results to satisfy sceptical readers or decision makers. $\endgroup$
    – whuber
    Commented Oct 6, 2022 at 19:22
  • 1
    $\begingroup$ The dependent variable takes values {1,2,3,..,7} and a single response is very influential to the distribution? Can you explain what this means in terms of the observed responses? I'm wondering whether it might be something like: a single value of 1 and all other values are at least 3 or 4. $\endgroup$
    – dipetkov
    Commented Oct 6, 2022 at 20:28
  • 1
    $\begingroup$ This is a tough situation. It happened to me once when one respondent in a small survey obviously had misinterpreted it by reversing the meaning of the scale: their responses were strongly negatively correlated with everyone else's responses. There was nothing I could do about it except present the analyses with and without their response and even that wasn't worth much because I had no control over future (institutional) uses of the survey results. $\endgroup$
    – whuber
    Commented Oct 8, 2022 at 15:19

3 Answers 3

1
$\begingroup$

So, I think you are asking the right questions and have some realistic concerns, but possibly not an irresolvable situation.

You are right, in the context of using Mann-Whitney U-Test on ordinal data, from my understanding and experience there is some reason to be concerned for the ability to detect the effect in question on your outcome.

This can also be true with a t-test that is not normal with small numbers of samples. However, with sufficiently large numbers of samples, this is less of a concern.

You are right, removing it causes bias. I would personally leave it in there.

$\endgroup$
1
$\begingroup$

The observation that out of 179 respondents all but one report a value in the range [1-5] and only one a value of 7 is interesting in its own right. You should highlight it no matter if the 7 is an "outlier" or not.

And for the analysis, an alternative to the t-test is a permutation test:

  1. Compute the observed difference in group means.
  2. Repeat B times: Shuffle the group labels and compute the difference in means.
  3. Compute p-value as the proportion of simulated differences that are at least as large as the observed mean difference.

Unlike the t-test there are no distributional assumptions. As the group labels are shuffled randomly, the outlier will end up sometimes in one group, sometimes in the other.

A permutation test (with two groups) is also easy to implement. Here it is in R:

permutation.test <- function(x, y, B = 5000, statistic = mean) {
  values <- c(x, y)
  labels <- rep(1:2, times = c(length(x), length(y)))

  replicates <- replicate(B, {
    shuffled <- sample(labels)
    statistic(values[shuffled == 1]) - statistic(values[shuffled == 2])
  })

  observed <- statistic(x) - statistic(y)

  p.value <- mean(abs(replicates) >= abs(observed))
  list(p.value = p.value, replicates = replicates)
}

I generate some fake data for illustration. In this case, the permutation test has a bigger p-value than both t-tests (with or without the outlier).

I suspect that this might be the case with your data as well since the large value would skew one or the other group mean as we shuffle the labels. And since you don't think the unusual data point is recorded incorrectly, the argument for the 7 being an outlier is to an extent about how rare responses of 7 or more units are in the population. (Actually, 6 or more units; even though 6 is not observed in this study, presumably a value of 6 is possible to occur.) Perhaps you can use your domain knowledge to estimate this.

set.seed(1234)

n <- 178

x <- sample(c(1:5), size = n / 2, prob = c(0.3, 0.2, 0.2, 0.1, 0.2), replace = TRUE)
# The outlier is in the first group
x <- c(7, x)
y <- sample(c(1:5), size = n / 2, prob = c(0.1, 0.2, 0.3, 0.3, 0.1), replace = TRUE)

# with outlier
t.test(x, y)$p.value
#> [1] 0.02363841

# without outlier
t.test(x[-1], y)$p.value
#> [1] 0.009900959

permutation.test(x, y)$p.value
#> [1] 0.0318
$\endgroup$
1
$\begingroup$

In the case of a Likert scale, in principle, I wouldn't be worried too much about outliers since they have a limited range anyway.

However, if you really do care, then an option may be to apply robust statistics, using for instance an MM-estimator such as the one implemented in the function lmrob of the robustbase package of R.

Borrowing from @whuber's comment, you may just run a two-sample $t$-test on the original data and another two-sample $t$-test on the cleaned data and report both results.

$\endgroup$
2
  • $\begingroup$ I don't what $t$ test you have in mind here. as the data without the outlier and the data with are overlapping sets. $\endgroup$
    – Nick Cox
    Commented Oct 7, 2022 at 13:15
  • $\begingroup$ @NickCox thanks for the comment. The OP may apply a two-sample $t$-test on both the original data and the cleaned data and report both results; see my updated answer. $\endgroup$
    – utobi
    Commented Oct 7, 2022 at 13:41

Not the answer you're looking for? Browse other questions tagged or ask your own question.