If I have a very small n for one group and a very large number of features, should I choose a parametric or a non-parametric test?

Question

I have a dataset that contains human metabolite concetration in a fluid. One group has about 12 samples, while another only has 5. My question is if I can assume normality for this data and do ANOVA/t-tests or if, given the small data-set and large number of features, I should do non-parametric tests.

So, for example, one of the features looks like this for the control group:

I mean, it follows a normal distribution maybe, but with so few points, can I really say it does? Also, should each feature follow a normal distribution for the 2 groups individually?

Then there's features that look like this:

Now if I plot their densities I get stuff like this, note that the data is log2 transformed:

@Subhash C. Davar yes they are, the smallest is the control group for some bizarre reason. — maglorismyspiritanimal, Commented Sep 5, 2023 at 8:29

Peter Flom · Accepted Answer · 2023-09-04 15:29:33Z

2

In general, if the assumptions are met, parametric tests are more powerful than their non-parametric equivalents.

But the problem here is not parametric vs. non-parametric, it's sample size and power and maybe overfitting.

Suppose you have only one feature to test. Then the difference would have to be huge to be statistically significant, and the parameter estimate will be very imprecisely measured. The exact numbers you will get depend on the exact situation, but let's say the groups have means that are 2 SD apart.

set.seed(1234) #Sets a seed

x <- rnorm(12, 10, 1)
y <- rnorm(5, 12, 1)

The result is statistically significant, but the 95% CI for the difference is -3.2 to -1.5, which isn't very good.

Welch Two Sample t-test

data:  x and y
t = -5.94, df = 10.461, p-value = 0.0001193
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.250345 -1.484742
sample estimates:
mean of x mean of y 
 9.557737 11.925281

And, if you start to get into more complex models, you run the risk of overfitting (even if the model is just a little more complex).

answered Sep 4, 2023 at 15:29

Peter Flom

125k36 gold badges179 silver badges410 bronze badges

$\begingroup$ But how can I know if the assumptions are met when the sample size is so small? $\endgroup$
– maglorismyspiritanimal
Commented Sep 5, 2023 at 8:30
$\begingroup$ It can be tricky, for sure. One more problem with small samples. $\endgroup$
– Peter Flom
Commented Sep 5, 2023 at 10:03
$\begingroup$ could i try fitting an ANOVA for each feature and then doing qq plots on the residuals? because I don't think I could do a single ANOVA on like 50 features for the sample size I have, right? I say ANOVA because I could add factors like age and sex. $\endgroup$
– maglorismyspiritanimal
Commented Sep 5, 2023 at 10:13
1

$\begingroup$ Why would metabolite concentrations have a normal distribution? What biomathematics dictates that? Nonparametric methods are very efficient even if normality were to miraculously hold. $\endgroup$
– Frank Harrell
Commented Sep 5, 2023 at 12:36
1

$\begingroup$ Why do you think anything should be Gaussian? It is true that heights tend to be more normally distributed than most variables, but normality of raw measurements in pretty much an accident of nature. The shape of the distribution also depends on study enrollment criteria. $\endgroup$
– Frank Harrell
Commented Sep 5, 2023 at 15:49

| Show 8 more comments

Stack Exchange Network

If I have a very small n for one group and a very large number of features, should I choose a parametric or a non-parametric test?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
t-test
nonparametric
parametric
or ask your own question.

Hot Network Questions

If I have a very small n for one group and a very large number of features, should I choose a parametric or a non-parametric test?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged hypothesis-testingt-testnonparametricparametric or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
t-test
nonparametric
parametric
or ask your own question.