0
$\begingroup$

I have a dataset which falls into 4 categories and I would like to use a Pearson's Chi-square test but I am unsure about the underlying parameters for the Chi-square test statistic. The scenario is as follows: I have experimentally tested 231 proteins, they fall into classes 1,2,3, and 4. For each protein, they can either have a specific property (let's call it TRUE) or not have the property (FALSE). Within each of the 4 classes, I know how many proteins are TRUE. Now I would like to statistically test, whether the number of TRUE proteins I get per class are random (Null-hypothesis) or somewhat biased in a way that I get more TRUE proteins for one of the classes.

To use the Pearson's test I am struggling to calculate the expected number of TRUE proteins per class, because I have to take into account that the number of proteins per class is set a-priori. Can anyone help with this?

To be more specific: I have 231 total proteins. 75 in class 1, 97 in class 2, 36 in class 3 and 23 in class 4. Out of them, we have 16 TRUE in class 1, 39 TRUE in class 2, 15 TRUE in class 3, and 18 TRUE in class 4. From these numbers, how would I calculate the Chi-square test statistics for expected number of TRUE proteins per class? Thank you very much

$\endgroup$

1 Answer 1

1
$\begingroup$

If you construct a contingency table for these data, you will see that the total TRUE is $88$ and the total class $1$ is 75, so the expected number of True and Class $1$, under the hypothesis of independence is $$\frac{75\times88}{231}=\frac{200}{7}$$

Since the observed total for this category is $16$, the contribution by this category to the $\chi^2$ statistic is $$\frac{(16-\frac{200}{7})^2}{\frac{200}{7}}=5.5314…$$

In a similar way you can calculate the contributions from the other seven cells in your table and get the total $\chi^2$ value. This is likely to exceed the critical value for 3 degrees of freedom at 5%, say, but it’s up to you what level you want to test at.

$\endgroup$
2
  • $\begingroup$ Thank you, David! I don't quite understand yet why it is \math{\frac{75\times 88}{231}}. $\endgroup$ Commented Jul 5 at 10:33
  • $\begingroup$ To work out the expected values, we assume that the probabilities of “1” and “TRUE” are independent, so the observed probabilities (i.e. relative frequencies) are multiplied, and then multiplied by the grand total to get the expected number, i.e. $\frac{75}{231}\times\frac{88}{231}\times 231$ $\endgroup$ Commented Jul 5 at 10:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .