6
$\begingroup$

Suppose a test has ~$16.67\%$ power to detect some arbitrary but fixed effect size when sample size is $3$, and as we increase size by adding IID random observations to the sample ${4, 5, 6, 7,...}$ power approaches a limit ~ ${20.83\%, 21.67\%, 21.81\%, 21.83\%,...}$, so that more observations after a certain point provide no meaningful increase in information/precision? (In this example, power increases according to a convergent sequence, but that's just for convenience of illustration.) Is this possible?

Stating as a more general question: Can it be shown that for any random variable computed from an IID random sample of a population, and that contains information about a parameter of that population, we can always increase its information (precision) to any desired level (up to perfect information/exact precision/zero standard error) by including some number of additional IID random observations in the sample?

A couple of specifications: First, this could be a NHST statistic or an estimator--if the former, there's an upper limit on the probability of rejecting the null; if the latter, there's a lower limit on the standard error. Second, I'm referring to a property of the test (or statistic), not of the construct or the data collection method. Third, I'm not asking whether there are statistics with this property that people actually use. I assume no one would consider it practical. My question is about the theoretical possibility, and whether it's been discussed or proven (one way or the other) in the literature.

Fourth, I assume one could construct a trivial example of such a test by arbitrarily restricting how much information one uses from the sample as a function of sample size. For example, one might use a statistic that is the sum of all of the first sixteen observations, half of the next sixteen, a quarter of the next sixteen, and so on; or one might use all of the observations but intentionally add an amount of noise in proportion to sample size. I'm not necessarily interested in such examples, but if it can be shown that this is the only way to create such a statistic, or that no non-trivial examples of such a statistic have been found, that would be very interesting.

$\endgroup$
0

2 Answers 2

6
$\begingroup$

Fisher information increases

The Fisher information (which is not the property of an estimator) scales with the size of the sample. See for instance here: https://en.m.wikipedia.org/wiki/Fisher_information#Discrepancy_in_definition

, if the data are i.i.d. the difference between two versions is simply a factor of $n$, the number of data points in the sample.

The Fisher information of a sample of size $n$ (with i.i.d. measurements) is the Fisher information of a single measurement times $n$.

Efficient estimate precision increases

You write

any random variable ... and that contains Fisher information about a parameter of that population

So if you talk about an efficient estimate (an estimate with a precision that equals the Fisher information) then: yes the precision will increase with increasing sample size.

Similarly, any estimator whose efficiency has some non-zero minimal bound for all $n$ $$e(T_n) = \frac{1}{Var(T_n) \times n \times \mathcal{I}(\theta)} \geq e_{min} >0$$ (where $\mathcal{I}(\theta)$ is the information of a single measurement and $n\mathcal{I}(\theta)$ of $n$ measurements) will have $Var(T_n) \to 0$ for increasing $n$.

Other estimates can do anything

But note that there are many non-efficient estimators/statistics that do not scale with increasing sample size.

  • Pathological estimator

    A well-known example is the sample mean of a Cauchy distribution as an estimator for the location parameter, which remains the same for increasing sample size (and I believe there are also examples where the variance of the sample mean even increases for larger sample size).

  • Oracle estimator

    If you do not like the example with the Cauchy distribution, because it is a pathological distribution, then you can consider this estimator

    $$\hat \theta_n = 42$$

    This is an estimator that can be used for the parameter θ of a non-pathological distribution and does not improve (increase in precision) when we increase n. (I agree that it is an example that makes little practical sense, but it indicates that maybe you need to be more precise about the definition of 'estimator').

  • Stupid estimator

    You could argue that this oracle estimator $\hat{\theta}_n = 42$ does not contain information (and in your edit you write about estimators that contain information), in that case, you can use this stupid estimator $$\hat{\theta}_n = \min\lbrace x_1, x_2, \dots, x_n \rbrace (n+1)$$ to estimate the parameter of a continuous uniform distribution between $0$ and $\theta$.

    The distribution of $\min\lbrace x_1, x_2, \dots, x_n \rbrace/\theta$ follows a beta distribution $Beta(1,n)$, and so we can easily compute the mean and variance of the estimate based on the mean and variance of the beta distribution.

    $$\begin{array}{rcl} E[\hat{\theta}_n] &=& \theta \\ Var[\hat{\theta}_n] &=& \theta^2 \frac{n}{(n+2)} \end{array}$$

    So the variance of this unbiased estimator will grow towards $\theta^2$ for increasing sample size.

Obviously, these examples are all silly non-pragmatic estimators. But, that is because of the issue that you are looking for. You are looking for estimators that do not work well with increasing sample size, and therefore you get silly estimators as examples.

See also: https://en.wikipedia.org/wiki/Consistent_estimator

$\endgroup$
11
  • $\begingroup$ Thanks, I've corrected my question per your note that Fisher information is a property of the sample, not of the estimator. But your Cauchy example is off point: the sample mean becomes more variable with larger n because the population mean does not exist, so the statistic has nothing to estimate. Ditto for the Cauchy sample variance. Can you give examples of statistics with non-zero, asymptotic limits on their information for real parameters? Pathological statistics computed for ordinary distributions, rather than ordinary statistics computed for pathological distributions, one might say. $\endgroup$
    – virtuolie
    Commented Oct 8, 2020 at 7:40
  • $\begingroup$ @MichaelNelson you wrote "Third, I'm not asking whether there are statistics with this property that people actually use. I assume no one would consider it practical. My question is about the theoretical possibility" The Cauchy distribution is, I believe, a nice example. It is irrelevant that the population mean does not exist, because the sample mean does not need to be an estimator for the population mean and you can also consider the sample mean as an estimator for the location parameter of the distribution. $\endgroup$ Commented Oct 8, 2020 at 8:43
  • $\begingroup$ I have added some additional examples of estimators. They are like the sample mean for estimating the location parameter of a Cauchy distribution all examples of silly non-practical estimators, but that is because of the issue that you are looking for. You are looking for estimators that do not work well with increasing sample size, and therefore you get silly estimators as examples. $\endgroup$ Commented Oct 8, 2020 at 9:17
  • $\begingroup$ I agree that the "Cauchy location estimator" is a relevant and interesting example. It illustrates the point that almost any estimator can display this property when applied to a weird enough distribution, something I hadn't considered when writing the original question. I would still also like to know if it has been shown whether a statistic can display the existence of pathological examples doesn't preclude or imply their existence, $\endgroup$
    – virtuolie
    Commented Oct 8, 2020 at 17:30
  • $\begingroup$ Ran out of time editing the above comment, but the last line should be replaced with: I would still also like to know whether a statistic computed without discarding observations (i.e., not "stupid") can have a limit on power for estimating a non-pathological parameter. That is another dimension of the question for which the pathological case has no implications. $\endgroup$
    – virtuolie
    Commented Oct 8, 2020 at 17:41
3
$\begingroup$

An example of a statistic that does not increase in Fisher information as sample size increases is the matching statistic. The matching statistic $m$ (Vernon, 1936) is computed between a pair of vectors of ranked scores, as the number of paired ranks that match. Gordon Rae (1987, 1991) showed that, when the population correlation between vectors is zero, $m$ has a relative asymptotic efficiency of zero. This means that, if we compute both $m$ and Spearman's rho (or another relatively efficient correlation estimator) on the same data, the Pearson correlation between $m$ and rho will be sizeable for small $n$, but will go to zero as $n$ goes to infinity. It can also be shown that, when the population correlation between vectors is greater than zero, $m$'s relative asymptotic efficiency is negative. This means that $m$'s standard error increases with increasing $n$, implying the loss of Fisher information.

$\endgroup$
4
  • $\begingroup$ Somehow the matching statistic has information about the correlation and this decreases in a larger sample. It makes intuitively sense because finding a match becomes more chaotic in larger samples and might decrease the information that this statistic gives. I imagine we could construct many types of statistics that have a similar drop in performance for larger samples. $\endgroup$ Commented Dec 4, 2021 at 9:29
  • $\begingroup$ @SextusEmpiricus: Yes, interestingly, you don't even need "matches." The most peculiar one I've found simply counts the number of common factors between each (integer) pair and then sums them. As for the "somehow," $m$'s distribution is Poisson(1) when data are uncorrelated, giving it $E$($m$) = 1, so observed deviations from 1 imply a probable non-Poisson distribution. The reason $m$'s parameter information decreases is that its sampling distribution approaches its limiting distribution with increasing $n$, which has a variance of no less than 1, resulting in a floor for its standard error. $\endgroup$
    – virtuolie
    Commented Dec 5, 2021 at 3:49
  • $\begingroup$ But do you have examples for statistics that are efficient, in some sense? Apart from finite populations ... $\endgroup$ Commented Nov 22, 2023 at 18:21
  • $\begingroup$ The matching statistic may converge to its own expected value faster than any other statistic, as a function of n, for all I know. In that case, it would be an efficient statistic, just not for the correlation. If you're looking for an efficient statistic that estimates something practical, well, m = n - h, where h is the Hamming distance between two permutations, an important distance metric in information theory and computer science. The number of fixed points in a permutation is also important in combinatorics. But no, I have no proven examples. $\endgroup$
    – virtuolie
    Commented Nov 22, 2023 at 19:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.