Skip to main content
edited tags
Link
RobPratt
  • 47.4k
  • 3
  • 24
  • 59
Source Link

Pearson's Chi-Square test, set distribution into categories but desired property within

I have a dataset which falls into 4 categories and I would like to use a Pearson's Chi-square test but I am unsure about the underlying parameters for the Chi-square test statistic. The scenario is as follows: I have experimentally tested 231 proteins, they fall into classes 1,2,3, and 4. For each protein, they can either have a specific property (let's call it TRUE) or not have the property (FALSE). Within each of the 4 classes, I know how many proteins are TRUE. Now I would like to statistically test, whether the number of TRUE proteins I get per class are random (Null-hypothesis) or somewhat biased in a way that I get more TRUE proteins for one of the classes.

To use the Pearson's test I am struggling to calculate the expected number of TRUE proteins per class, because I have to take into account that the number of proteins per class is set a-priori. Can anyone help with this?

To be more specific: I have 231 total proteins. 75 in class 1, 97 in class 2, 36 in class 3 and 23 in class 4. Out of them, we have 16 TRUE in class 1, 39 TRUE in class 2, 15 TRUE in class 3, and 18 TRUE in class 4. From these numbers, how would I calculate the Chi-square test statistics for expected number of TRUE proteins per class? Thank you very much