6
$\begingroup$

Consider the following two statistical principles: 1) an exact test's $p$-value gives the exact frequency with which the observed random sample appears by chance, i.e., under a true null hypothesis; and 2) the Fisher information in a statistic is an inverse function of the estimator's standard error, the extent to which its observed value varies around its true value when computed on a sample of size $n$.

My interpretation of these two principles is that an exact $p$-value can contain no Fisher information, and in fact the same must be true of all permutation statistics and the samples on which they are computed. Is this correct? If so, is the exact $p$-value considered to contain some other kind of information defined under estimation theory?

Edit: My question implicitly assumes that an exact $p$-value is a unique, sample-specific quantity, and that "inexact" $p$-values estimate the "true value" of a probability parameter$-$not a parameter of the population from which the sample has been drawn, but a parameter of the experiment, conditional on the sample and the null hypothesis. An acceptable answer would be to show this assumption is false, and why. But, even if $p$ is not estimating a probability parameter, it is inarguably conveying information in some sense. I'd still like an explanation about what interpretation, if any, estimation theory gives that information.

To be clear, this is a conceptual question, a question of estimation theory, not a computational question. I understand one could easily compute the expected information in both the $p$-value and the sample as their combinatorial entropy. But I'm asking whether there's a conception of information under estimation theory that applies here, either as an alternative to Fisher information or as a broader definition than I give above.

$\endgroup$
11
  • 3
    $\begingroup$ Could you please explain what you mean by "contain no Fisher information"? $\endgroup$
    – whuber
    Commented Nov 3, 2023 at 18:50
  • 2
    $\begingroup$ @virtuolie Ronald Fisher was a pretty big deal in statistics and has multiple ideas named for him. Just because the p-value from Fisher’s exact test tells you some information about something does not mean that it tells you Fisher information. $\endgroup$
    – Dave
    Commented Nov 3, 2023 at 19:27
  • 1
    $\begingroup$ I don't really understand what you are asking, but if an exact value has less information (of any kind) than an approximate one, something is very wrong. If Fisher information is an inverse function of the SE, then an exact parameter estimate would have 1/0 Fisher info, which is nonsense, but if the denominator is close to 0, information would be high. Finally, I learned a long time ago: Parameters come Populations. Statistics come from Samples. A p value is always from a sample -- the p value of a population would be 0. $\endgroup$
    – Peter Flom
    Commented Nov 3, 2023 at 19:47
  • 1
    $\begingroup$ Fisher information is not the same as information and is not something that can be 'contained'. See also Conflicting Definition of Information in Statistics | Fisher Vs Shannon $\endgroup$ Commented Nov 4, 2023 at 0:33
  • 2
    $\begingroup$ "1) an exact test's p-value gives the exact frequency with which the observed random sample appears by chance, i.e., under a true null hypothesis" - not true. The p-value is the probability under H0 of a bigger event that contains not only the observed sample but also all other samples that would provide evidence against the H0 as strongly or stronger. $\endgroup$ Commented Nov 4, 2023 at 11:27

4 Answers 4

6
$\begingroup$

An exact p-value that you calculate in a significance test is not an estimation or approximation of anything. It is the exact probability according to the statistical model of obtaining data at least as extreme as those that you did obtain when the null hypothesis is true.

It is exactly that, and it is an exact value of that. There is no 'correct' or 'true' value that it estimates or approximates.

The Fisher information that you get from the same sample (and, often, the same statistical model) is the same type of thing: exactly what it is and not an estimate or approximation.

The reason that they are not estimates or approximations is that they relate to the specific data obtained and the statistical model chosen, not to the true value of a parameter of a 'true' model.

If you are interested in the relationships between p-values and liklelihood functions (and hence, Fisher information) then you might enjoy this paper that I ArXived a long time ago: https://arxiv.org/abs/1311.0081

$\endgroup$
7
  • $\begingroup$ I'm posting below a comment I just left here: stats.stackexchange.com/questions/94974/…. Please let me know where my error is. $\endgroup$
    – virtuolie
    Commented Nov 3, 2023 at 21:16
  • $\begingroup$ Since p is an exact function of the statistic theta-hat, it must have a sampling distribution exactly correlated with that of theta-hat. If the reality is that theta = 0, p has a null distribution with an expected value and a standard error. Yes, once you have a sample in hand, p is what it is, but theta-hat also is what it is. One can nonetheless construct a CI around either, no? In other words, if you say you'll reject the null 5% of the time when theta = 0, you're implying a population distribution of p-values, from which you will draw a p-value <= .05 5% of the time. $\endgroup$
    – virtuolie
    Commented Nov 3, 2023 at 21:18
  • $\begingroup$ That said, if my comment above is correct, I guess it answers my question: the exact p-value contains Fisher information just like a non-exact p-value, in that both vary over samples. In which case, the misapprehension that underlies my question is that "exact" doesn't mean I'd get the same p-value every time, but that the p-value is obtained as an exact frequency with respect to a combinatorial set (in the case of a permutation statistic), rather than as a proportion of an area under a curve. $\endgroup$
    – virtuolie
    Commented Nov 3, 2023 at 21:24
  • $\begingroup$ @virtuolie Yes, you can construct an interval around a p-value by varying the model, but that relates to the model, not the specific observed p-value. You can also construct en expectation range for p-values without having observed one, but that (obviously) does not relate to the observed p-value because it does not relate to the actual data and data generating system. $\endgroup$ Commented Nov 3, 2023 at 22:39
  • $\begingroup$ @virtuolie You don't get the same p-value every time because you don't get the same data every time! I'd say that the germinal error in your response to the question that you linked is that you are trying to impose a meaning on a (neo-Fisherian) p-value using (Neyman–Pearsonian) error rates. The two things are different and using one to 'explain' the other only adds confusion. Read this paper: link.springer.com/chapter/10.1007/164_2019_286 $\endgroup$ Commented Nov 3, 2023 at 22:46
5
$\begingroup$

The Fisher information matrix* is defined as the variance of the score

$$I(\theta) = E\left[\left(\frac{d}{d\theta} \log f(X|\theta)\right)^2\right]$$

where $\theta$ is the parameter value of a distribution and $X$ is the observation.

The Fisher information matrix is a function of the parameter value and a family of parametric probability distributions $f(X|\theta)$ (it is not the function of a sample).

A p-value is a statistic computed from a sample $X$. It doesn't relate so clearly to the concept of the Fisher information matrix , because it is not the parameter of a parametric probability distribution. However from two different perspectives one might connect a p-value with this 'information' about a parameter.

Your confusion seems to stem from projecting the idea about exact p-value versus inexact p-value, to the idea that a p-value has some true value and we 'estimate' that value by means of inference. But the estimate is often a mathematical approximation and not a statistical (inference) approximation.

Yet, while I don't believe that this relates to your original thoughts, a p-value can still be related to the Fisher information matrix.

p-value fisher information matrix I

Sometimes, a p-value can be statistically estimated by means of a Monte Carlo approach, and then the estimation of a p-value can be considered as statistical inference. In that case you can apply the concept of the Fisher information matrix.

For that case/perspective, you could consider an exact p-value as a 'statistical' estimate of the true p-value with perfect precision, and one that follows a degenerate distribution. The score (and the Fisher information matrix) is in that case infinite or undefined. This is because the derivative $\frac{\text{d}}{\text{d} p} \log f(\hat{p}|p)$ is undefined. The function $f(\hat{p}|p)$ is not a smooth function. For a given $\hat{p}$ it has a value 1 for $p= \hat{p}$ and is zero when $p\neq \hat{p}$.

p-value fisher information matrix II

The Fisher information matrix is not a function of an estimator. It is a function of the parametric distribution family of that estimator $f(X|\theta)$ evaluated at a particular value of the parameter $\theta$. The $X$ that the frequency distribution $f$ describes can be an estimator, but it can also be a statistic.

One can consider the difference in the Fisher information matrix for different distribution of different parameters $f(X_1|θ)$ and $f(X_2|θ)$ and consider the Fisher information matrix as a property of the estimator.

With this perspective we can use a p-value as the statistic and consider the Fisher information matrix for the frequency distribution of the observed p-value $f(p|\theta)$. Since that distribution is not constant as function of $\theta$ the Fisher information matrix will not be zero.

Fot several cases the p-value can have a one to one relationship with the estimate of the parameter. An example is a one sided z-test for a test of the mean of a normal distribution with known variance.


*I place stress on 'matrix', since Fisher information is not 'information' and is not something that can be 'contained'. See also the discussion here: Conflicting Definition of Information in Statistics | Fisher Vs Shannon

$\endgroup$
2
  • 1
    $\begingroup$ I'd just add that a statistic is not synonymous with an estimator, which is already implicit in your answer but may be helpful to make explicit. A statistic is "any quantity computed from values in a sample which is considered for a statistical purpose," while an estimator is "a rule for calculating an estimate of a given quantity based on observed data." (Both definitions per Wikipedia.) An estimator is a statistic that implies Fisher information, but a p-value is not an estimator, just a function of the data "used for a statistical purpose." $\endgroup$
    – virtuolie
    Commented Nov 4, 2023 at 11:12
  • 2
    $\begingroup$ The Fisher information matrix is not a function of an estimator. It is a function of the parametric distribution family of that estimator $f(X|\theta)$ evaluated at a particular value of the parameter $\theta$. The $X$ that the frequency distribution $f$ describes can be an estimator, but it can also be a statistic. One can however consider the difference in the Fisher information matrix for different distribution of different parameters $f(X_1|\theta)$ and $f(X_2|\theta)$ and consider the Fisher information matrix as a property of the estimator.... $\endgroup$ Commented Nov 4, 2023 at 11:27
3
$\begingroup$

Michael Lew's answer(s) are exactly right, and I believe I can enhance them by describing the insight that finally allowed me to see that. A valid test statistic is an ancillary statistic, meaning its distribution does not depend on the parameters of the model. Because the $p$-value is a function of the test statistic, it is also an ancillary statistic. This, I believe, is what Michael means when he says in his answer that $p$-values "relate to the specific data obtained and the statistical model chosen, not to the true value of a parameter of a 'true' model."

In other words, because the $p$-value is ancillary, it is not a statement about parameters. It is just a fact about (say) the central (null) $t$-distribution with degrees of freedom implied by $n$. We know the attributes of that distribution a priori, independent of the population or the data.

In fact, when trying to construct a counterargument, I learned that Neyman was highly insistent that $p$-values and confidence intervals do not convey information. Instead, they are just facts upon which statisticians base practical decisions about how to interpret the empirical data. To wit: "The parameter is an unknown constant, and no probability statement concerning its value may be made." (Several quotes from the same article by Neyman may be found in this post.)

As for my extended question, about whether estimation theory defines the information in the $p$-value, Sextus Empiricus pointed to an excellent explanation here. Basically, Fisher information is not information in the general sense, but a function of estimator variability. Again, the $p$-value is just a fact about null distributions of the type assumed by the probability model, which we invoke because it happens to be useful for making decisions based on the sample in hand. The notion of variability does not apply to mere facts, so neither does Fisher information.

TL;DR: What is the information in an exact p-value? None. A p-value, exact or otherwise, is just a general fact about the assumed null distribution, which happens to be a useful tool for making decisions based on the sample.

$\endgroup$
2
$\begingroup$

This is a second answer that is based on a comment by the OP concerning 'exact" p-values from permutations tests. It's too long for a comment and so I make it an answer (to a question not asked!)

The 'exact'ness of a permutatations test p-value relates to the fact that the statistical model is an exact fit to the data generating system, not the some precision of the p-value. The statistical model for a permutations test requires only some level of data exchangeability, and that is a safe assumption unless the experimental design or implementation is terribly flawed. Thus the statistical model of a permutations test can be confidently assumed to be an exact fit to the data generating system (really, the sampling system, I suppose) and thus the observed p-value will inherit a degree of 'exactness'.

For a Student's t-test, for example, the model is almost always only an approximation of the actual data generating system and so the resulting p-value is in some way more 'approximate' as a result, even when it is expressed with enough decimal places to be practically exact in another sense.

$\endgroup$
2
  • $\begingroup$ Actually, I had edited my question to say that any disproof of my assumptions, implicit or explicit, is a good answer. +1! $\endgroup$
    – virtuolie
    Commented Nov 4, 2023 at 0:32
  • $\begingroup$ Per this answer, however Fisher information does (or does not) apply to exact p-values, I agree that it does so in the same way as to parametric p-values. $\endgroup$
    – virtuolie
    Commented Nov 4, 2023 at 0:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.