2
$\begingroup$

I get stuck with understanding of the following two statements (from Wikipedia on p-values):

  1. The p-value is the probability of obtaining at least as extreme results given that the null hypothesis is true whereas the significance level $\alpha$ is the probability of rejecting the null hypothesis given that it is true.

  2. If one defines a false positive rate as the fraction of all “statistically significant” tests in which the null hypothesis is actually true, several arguments suggest that this is at least about 30 percent for p-values that are close to 0.05.

It is more or less explained in the Regina Nuzzo's 2014 Nature editorial. I predefined a level of significance = 0.05, made a single test and got a p-value of 0.049. The second statement tells me that the chances that I will be able to replicate this result using another sample is not 95%, but much lower. (I think it should be dependent on prior probabilities, but the statement in wikipedia makes more general conclusion).

The questions are:

  1. Is the second statement correct? Does it suggest that a priori probabilities of two hypothesis are equal to 0.5?

  2. How to understand it intuitively?

$\endgroup$
15
  • $\begingroup$ In your statement #2 it should read "false discovery rate" instead of "false positive rate". $\endgroup$
    – amoeba
    Commented Jan 14, 2016 at 10:19
  • $\begingroup$ see stats.stackexchange.com/questions/166323/…, section 1 is about p-values and section 2 about FDR $\endgroup$
    – user83346
    Commented Jan 14, 2016 at 10:32
  • $\begingroup$ Why? Is it because the second statement suggests multiple testing? $\endgroup$ Commented Jan 14, 2016 at 10:33
  • 1
    $\begingroup$ Any estimate of a false discovery rate (as opposed to calculating a rate conditional on the truth of some hypothesis) must involve assumptions about the prior probabilities of hypotheses. The assumptions may be plausible for some collections of tests (say those appearing in papers published in the journals of a particular field) but not for others. So unqualified statements like those in the Wikipedia article are unwise. [But now I look at the article in question I see that statement is followed by:-" In order to arrive at this number, one needs to postulate something about the prior ... $\endgroup$ Commented Jan 14, 2016 at 12:30
  • 1
    $\begingroup$ I see. I did not realize it's an exact quote. I have now formatted it as such. $\endgroup$
    – amoeba
    Commented Jan 14, 2016 at 12:35

1 Answer 1

1
$\begingroup$

I suggest you to read http://rsos.royalsocietypublishing.org/content/1/3/140216 that contains most of the elements you need.

To answer your first question, for a set of tests providing $p$-values $\in [0.045,0.05]$ and power=0.8, the FDR (as defined in the second statement of your question) is 26% if there are as many tests with true effect as tests with no true effect (page 9 of the paper). Notice that the restriction to $p$-value $\in [0.045,0.05]$ is very important and the FDR decreases with letting the $p$-value having smaller values or/and when the proportion of true effect tests increases.

To answer your second question, the two statements are radically different. Indeed, in statement 2 of FDR, the ratio is obtained by averaging over all conclusive tests accounting from both the real effect case and the not-real effect case (with a given proportion). While in the first statement for the type I error, the ratio is computed over the (hypothetical) tests for the all observations that could be generated under the null hypothesis of no real effect only.

$\endgroup$
5
  • $\begingroup$ Yes, but again, in abstract: If you use p = 0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time. Isn't it true only for 2 hypothesis, P(H0) = P(H1) = 0.5? (I guess your words about the ratio true effect/no effect means the same). Also they say nothing about the power of test. I believe it can be explained from 2 points of view: yours, using the power of tests, and Bayesian, using only prior probabilities... $\endgroup$ Commented Jan 14, 2016 at 11:42
  • $\begingroup$ @GermanDemidov. Yes, 26% is derived with P(H0) = P(H1) = 0.5 (but I am not sure that it can be write in this term knowing that everything is frequentist here?) and a given power that they consider as reasonable i.e. =0.8. Obvisouly the results depend on the power of the test but it maybe to generalize... $\endgroup$
    – beuhbbb
    Commented Jan 14, 2016 at 13:07
  • $\begingroup$ IMHO, the interest is not on the figure of 30% but to be aware of that effect. Nevertheless I agree that Bayes factor with well designed prior of model (which is another problem) yields to a more satisfactory solution (look swfsc.noaa.gov/uploadedFiles/Divisions/PRD/Programs/… for this aspect). Hope it helps $\endgroup$
    – beuhbbb
    Commented Jan 14, 2016 at 13:15
  • $\begingroup$ yes I provided this link in the comments to the main message =) The next question for me is to understand how to calculate priors, do you know some papers/books considering this? Have a big troubles with understanding bayesian way of thinking...Also do not understand how to compare between "power-of-tests" method and Bayesian - they definetly can give different FDR for the same data... $\endgroup$ Commented Jan 14, 2016 at 13:21
  • $\begingroup$ @GermanDemidov IMHO this last question cannot be answered without a more precise description what your trying to do i.e. what kind of comparison are you looking for .IMHO, as mentioned in comments of your question, the "30% statement" is very unspecific and give a figure to something that is in practice very dependent on the application... maybe some other users would answer this more specifically. $\endgroup$
    – beuhbbb
    Commented Jan 14, 2016 at 13:30

Not the answer you're looking for? Browse other questions tagged or ask your own question.