I am looking for a way to compare the proportions between two data sets of surveys, where individuals can be placed in more than one classification.
One of my groups contains the survey results of 10.000 individuals, and the other is a subset of 100 individuals that I randomly selected from the first dataset (they are not independent).
pop samp
Dutch 0.03539377 0.05
French 0.13623071 0.18
English 0.98779873 0.98
I want to know how likely it is that a random sample (of 100 individuals) will have significantly different proportions when compared to the full population.
For this I have generated 50000 random sets of 100 individuals each, and now I want to calculate how many of those are significantly different from the whole population (with a p.value <= 0.05).
Both the chisq.test, prop.test and wilcox.test gave similarly inappropriate results (out of 50000 random subsets of 10 individuals each, only 4% of subsets where significantly different), I am now trying multinomial.test, which I think might be closer to what I want, but not really sure.
Any suggestion?