What non-parametric test for multivariate binary data should I use?

Question

I have two different groups of participants ("g" and "b") answering the same set of questions. Group "g" answered questions in the same order. Group "b" answered questions in a randomized order. The answers are binary Y/N (0/1).

This is my dataset:

             |   group   | question1 | question2 | question3 | ... | questionN | 
             |-----------|-----------|-----------|-----------| ... |-----------|
participant1 |     g     |     1     |     0     |     1     | ... |     1     | 
participant2 |     g     |     0     |     1     |     0     | ... |     0     | 
participant3 |     g     |     0     |     1     |     0     | ... |     0     | 
...          |    ...    |    ...    |    ...    |    ...    | ... |    ...    | 
participant8 |     b     |     1     |     0     |     1     | ... |     1     | 
participant9 |     b     |     0     |     0     |     1     | ... |     0     | 
participantN |     b     |     0     |     1     |     0     | ... |     0     |

What is the best test to perform to understand if there is a statistical difference in responses between the groups ("g" and "b")? In other words, what test should be used to understand if the order of questions has an impact on the responses?

MANOVA seems not to be suitable due to the lack of multivariate normality (binary data Y/N 0/1 inherently follows a Bernoulli distribution). How about the Kruskal-Wallis test or Fisher's Exact test?

I'm looking for the best non-parametric test for multivariate binary data.

Any suggestions would be highly appreciated!

From your question it is not completely clear : you want to understand if there is a difference between groups in what ? — CaroZ, Commented Oct 18, 2023 at 20:11
Great question thanks! I'll edit my original post. I didn't mention that my questions for group b were randomized. I want to prove that the order of the questions is not influencing the responses. — Roland D, Commented Oct 18, 2023 at 20:28
I wonder if it would be legitimate to run an interaction regression, modeling the binary outcome as a function of the group, the question, and the group-question interaction. I have played with this idea in other contexts and have gotten good simulation results, though my simulations might not generalize beyond the particular cases I’ve used. — Dave, Commented Oct 18, 2023 at 20:50

CaroZ · Accepted Answer · 2023-10-18 20:42:10Z

0

One solution would be to fit one binomial GLM (logistic regression) per answer. In R, it would look something like this : glm(question n~group, family=binomial) this way you would know if the outcome of one particular question is explained by the group the participant belongs to, i.e. if the answer to the question which is the response variable of your model is different whether the order of the questions is randomized. However, this approach is not multivariate.

edited Oct 18, 2023 at 20:42

answered Oct 18, 2023 at 20:30

CaroZ

7953 silver badges13 bronze badges

Add a comment |

Stack Exchange Network

What non-parametric test for multivariate binary data should I use?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
statistical-significance
multivariate-analysis
nonparametric
binary-data
or ask your own question.

Hot Network Questions

What non-parametric test for multivariate binary data should I use?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged hypothesis-testingstatistical-significancemultivariate-analysisnonparametricbinary-data or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
statistical-significance
multivariate-analysis
nonparametric
binary-data
or ask your own question.