2
$\begingroup$

For example, I want to do test on each variable independently. Some variables are normally distributed, some are log-normally distributed, some just can't be transform to normal distribution (not even by Box-Cox, etc.). Because for each condition, T-test, T-test after log transformation, non-parametric T-test gain highest power than the other two methods. Can I simply use different method for different situation because of high power?

I guess the answer is No because of increment of Type I error. But I just cannot understand how does this increase Type I error. Isn't it true that we should use right test (the test that makes right assumption)?


To make question more clear. Suppose I have a metabolomics data set, which contains two group, group health (size:15) and group ill (size:15). Columns are 1000 metabolites. So data set is 30*1000. I am trying to figure out which metabolites may be influenced by the treatment. First I do univariate analysis, t-test on each metabolite (More suggestion on statistical methods for omics data analysis is more than welcomed). We don't know about the distribution of each metabolite. So we can only make assumption of normality based on the data we have. So what I try to do here is for each metabolite, if the metabolite is normally distributed , then use t-test; if the metabolites can be transform to normal distribution, after transformation use t-test. Otherwise use non-parametric method for comparing two group means (non-parametric t-test is not a good word).

My question is that, can I do so? Thank you!

What I am doing is not trying to use the statistical test which gives the smallest p-value for each variable. I believe this is not correct.

$\endgroup$
2
  • $\begingroup$ Can you be more precise about what you mean by "nonparametric t-test"? (e.g. Can you describe it/do you have a reference? Do you just mean a Wilcoxon-Mann-Whitney test or do you mean something else?) How are you calculating the power? Do you actually mean that you're choosing on the basis of p-value (which is not power)? What is it you're testing about each variable separately (what's the question of interest?) Please clarify your question and be more explicit about what you're doing/proposing to do. $\endgroup$
    – Glen_b
    Commented Nov 16, 2015 at 2:36
  • $\begingroup$ Thank you Glen. I've added more details. For power, I mean the ability of detecting the existence of difference between group means when the groups means are truly different. Not p value. $\endgroup$
    – WCMC
    Commented Nov 16, 2015 at 3:11

3 Answers 3

0
$\begingroup$

If you choose between tests on the basis of $p$ value then you will inflate your type I error rate, so that you're not conducting your tests at the advertized significance level.

[This does increase power, of course, as it does any time you inflate the type I error rate.]

How does this occur? Note if the null hypothesis is true* and you have a continuously distributed test statistic, the p-value for a test will be a random uniform. Now if you're generating multiple test statistics they won't be independent, but they will be different (they're not perfectly dependent). As a result, in some situations one test will lead to a smaller p-value, and in other cases another will, and so on. So while each test may have a uniform p-value when the null is true, (and so an $\alpha$ chance of being smaller than $\alpha$), the smallest of several statistics will be more likely to be below $\alpha$.

* at the least, lets assume that the distributions under the null are identical, and the individual nulls+associated assumptions are at least approximately true


If you're choosing between tests based on some test of normality or even some more visual assessment you actually end up with a very similar issue; the distribution of the p-values of the test you ultimately do is not uniform under the null hypothesis

$\endgroup$
3
  • $\begingroup$ Thank you again, Glen. Sorry I should have make my question clearer. I didn't select test based on p-value. But I select test based on whether the normality is violated. Basically, my question is that is this feasible? $\endgroup$
    – WCMC
    Commented Nov 16, 2015 at 3:17
  • $\begingroup$ @WCMC I'm sorry, I don't know how I missed this comment before. It's commonly done but several papers recommend not using this approach -- the properties of both tests (the two you're choosing between) are affected by using a test of assumptions to choose between them. If you're not pretty sure that the distribution is pretty close to normal (in particular, heavy tails can be an issue), it may be much better to use the Wilcoxon-Mann-Whitney test. $\endgroup$
    – Glen_b
    Commented Apr 17, 2017 at 16:57
  • $\begingroup$ I agree and I am currently using Mann-Whitney (or other nonparametric testing procedures) if not sure about the distribution. $\endgroup$
    – WCMC
    Commented Apr 17, 2017 at 21:24
0
$\begingroup$

If you have concern about data being non-Normal, note that the $t$-test is pretty robust to deviations and so if the Normality assumption isn't severely isolated, you should be fine. That being said, the Wilcoxon-Mann-Whitney test will have more power than the $t$-test if the data are non-Normal and will have comparable power if the data are Normal.

I don't think it's incorrect, per se, to use two different tests, but conventionally one would use one test on each variable. This is especially true if, for example, one were publishing a paper or assessing the significance of variables in a regression analysis.

$\endgroup$
1
  • $\begingroup$ Thank you Matt. Your answer reminds me of something. So I added more on my question. $\endgroup$
    – WCMC
    Commented Nov 16, 2015 at 3:12
0
$\begingroup$

Given that the t-test's assumption of normality applies to the residuals, not the raw data itself, you can't actually check the assumption until after running the tests to get the data. If the residuals are non-normal, then you can discard the results of the test and use a nonparametric method, so long as the t-test's p-value is not part of your decision to do so.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.