5
$\begingroup$

I can't understand the paragraph in Completeness (statistics) - Wikipedia:

We have an identifiable model space parameterised by $\theta$, and a statistic $T$. Then consider the map $f:p_{\theta }\mapsto p_{T|\theta }$ which takes each distribution on model parameter $\theta$ to its induced distribution on statistic $T$. The statistic $T$ is said to be complete when $f$ is surjective, and sufficient when $f$ is injective.

(added on 2023-07-12 without any citation, and there's no revision after that)

  • What does “distribution on $\theta$” mean?

    Domain of $f$ is all prior distribution of $\theta$ (in Bayesian sense), or the famliy of distributions of samples ($X_1, \ldots, X_n$)?

  • What is the codomain of $f$?

    I guess the image of $f$ is all possible distribution of $T$ (i.e. the famliy of distributions of $T$), but codomain should be larger than that, or $f$ is always surjective.

My thoughts

Sufficiency and completeness are related but independent concepts, as discussed in the following questions. If the statement in Wikipedia is true, then it’s a clear explanation of sufficiency and completeness.

Roughly speaking:

  • $T$ is sufficient: $T$ provides all information of $\theta$ from $X$, and we can recover the whole distribution of $X$ once given $T$.

    $f$ is injective: If we know that $y$ is $f$ of some $x$, then there is exactly one $x$ that satisfies $y = f(x)$.

  • $T$ is complete: For any function $g$, whenever $\operatorname{\mathbb{E}} g(T) \equiv 0$, then $\Pr(g(T) = 0) \equiv 1$, where “$\equiv$” means $\forall \theta$.

    $f$ is surjective: For any two functions $g, g'$, whenever $g \circ f = g \circ f$, then $g = g'$.

I can feel that these concepts are connected, but I can’t make it rigorous…

Alternative comprehension of sufficiency and completeness

Basic intuition about minimal sufficient statistic - Cross Validated: $T$ can be seen as an indexed partition of the sample space.

st.statistics - Is a function of complete statistics again complete? - MathOverflow: $\operatorname{\mathbb{E}} g(T) \equiv 0$ means the distributions of $T$ for varying $\theta$ span the whole space of functions of $T$.

$\endgroup$
3
  • $\begingroup$ I'll accept my own answer because there's no other answer… $\endgroup$
    – Y.D.X.
    Commented Oct 29, 2023 at 12:30
  • $\begingroup$ See stats.stackexchange.com/questions/196601/… $\endgroup$ Commented Oct 29, 2023 at 14:08
  • 1
    $\begingroup$ @kjetil-b-halvorsen I've read all 2 answers there. They provide good intuition on how estimators work, but I can't see how they are related to the map $f$ here. Could you explain more specifically? $\endgroup$
    – Y.D.X.
    Commented Oct 29, 2023 at 15:35

2 Answers 2

4
$\begingroup$

We need to clarify the statistical model $(\Omega,\mathcal{P})$, where $\Omega$ is the sample space, and $\mathcal{P} = \{P_\theta: \theta \in \Theta \}$ is the set of probability distributions on $\Omega$, and $\Theta$ is the parameter space. Those $\Omega, P_\theta$ are those in the probability triple $(\Omega, \mathcal{F}, P_\theta)$.

  • $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, where $X$ is called a random variable, $T$ is called a statistic.
  • $\mathcal{F} \overset{P_\theta}{\to} \mathbb{R}$.

We say that the model is identifiable if the mapping $\theta \mapsto P_\theta$ is injective.

What does “distribution on $\theta$” mean?

The domain of $f$ is neither prior distributions, nor the family of distributions of $X$. It is $\mathcal{P}$.

What is the codomain of $f$?

All $\mathcal{T} \to \mathbb{R}$ functions. This is different from and larger than the family of induced distributions of $T$, because the latter is $\{P_{T|\theta}: \theta \in \Theta\}$ where $P_{T|\theta}(t) = P_\theta(X^{-1}(T^{-1} (t)))$, and the former includes functions that cannot be represented by $P_\theta$.

$T$ is sufficient $\iff$ $f$ is injective

My rough thoughts are right here.

  • $\implies$

    If $T$ is sufficient, then $P_{X|T,\Theta}(x, t;\theta)$ does not depends on $\theta$. That is to say, $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}, \forall \theta_1, \theta_2 \in \Theta$, $P_{X|T,\Theta}(x,t;\theta_1) = P_{X|T,\Theta}(x,t;\theta_2)$.

    Now let’s prove $f$ is injective.

    For any $P_{T|\theta_1}$ and $P_{T|\theta_2}$ in the image of $f$, if $P_{T|\theta_1} = P_{T|\theta_2}$ (i.e. $\forall t \in \mathcal{T}, P_{T|\theta_1}(t) = P_{T|\theta_2}(t)$), then $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}$, $$ P_{\theta_1}(x) = P_{T|\theta_1}(t) \times P_{X|T,\Theta}(x,t;\theta_1) = P_{T|\theta_2}(t) \times P_{X|T,\Theta}(x,t;\theta_2) = P_{\theta_2}(x). $$ This is exactly the definition of $P_{\theta_1} = P_{\theta_2}$.

  • $\impliedby$

    Similarly, for any $P_{T|\theta_1}, P_{T|\theta_2}$ in the image of $f$, then $P_{\theta_1} = P_{\theta_2}$. By the definition of conditional distribution, $$ P_{X|T,\Theta}(x,t;\theta) = \frac{P_{\theta}(X =x \land T(x) = t)}{P_{T|\theta}(t)} = \frac{P_{\theta}(x)}{P_{T|\theta}(t)}. $$ Therefore $P_{X|T,\Theta}$ is same for $\theta_1$ and $\theta_2$.

$T$ is complete $\iff$ f is surjective

It looks like that this has nothing to do with cancelability. Instead it’s related to the geometric intuition:

st.statistics - Is a function of complete statistics again complete? - MathOverflow: $\operatorname{\mathbb{E}} g(T) \equiv 0$ means the distributions of $T$ for varying $\theta$ span the whole space of functions of $T$.

  1. $\operatorname{\mathbb{E}} g(T)$ is an inner product of $P_{T|\theta}$ and $g$.
  2. $\operatorname{\mathbb{E}} g(T) \equiv 0$ means $g \perp \{P_{T|\theta}: \theta \in \Theta\}$.
  3. Moreover, if the only perpendicular $g$ is $0$ (almost surely), then $\{P_{T|\theta}: \theta \in \Theta \}$ spans the whole $\mathcal{T} \to \mathbb{R}$ functions. This is the meaning of “$f$ is surjective”. (It’s not the normal meaning — we let them span here.)

Examples

Sufficient but not complete

$\Theta = \mathbb{R}$, $X \sim \mathcal{U}(\theta, \theta+2\pi)$ and $T = X$.

  • $T$ is sufficient: $T$ gives the same information of $\theta$ as $X$.

    $f$ is injective: It’s the identity map!

  • $T$ is not complete: $\operatorname{\mathbb{E}} \sin X \equiv 0$ no matter how $\sin X$ is distributed.

    $f$ is not surjective:

    • $\mathcal{T} = \mathbb{R}$ so codomain of $f$ is all functions on $\mathbb{R}$.

    • The image of $f$ is $\{\boldsymbol{1}_{[\theta, \theta+2\pi]}: \theta \in \mathbb{R} \}$, where $\boldsymbol{1}$ is the indicator function.

      Its intersection to the set of $2\pi$-periodic functions is the set of constant functions. In other words, the image of $f$ cannot span functions with period $2\pi, 4\pi, 6\pi$ and so on.

      Alternatively, the Fourier transform of $\boldsymbol{1}_{[\theta, \theta+2\pi]}$ is $2\pi \operatorname{sinc}(\pi \omega) e^{-i(\theta-\pi)\omega}$, which has zeros at $\omega = 1,2,\ldots$.

Constant statistic

  • $T$ is not sufficient: Obvious.

    $f$ is not injective: $f$ always maps to the same singleton distribution.

  • $T$ is complete: $g(T)$ is deterministic. It has to be $0$ if the expectation is $0$.

    $f$ is surjective: $\mathcal{T}$ is a one-point set, so any $\mathcal{T} \to \mathbb{R}$ function is a basis of codomain of $f$.

Ignore some samples

First we work out a complete and sufficient statistic for $n$ samples. Now we're given more samples but we stick to the old statistic.

  • $T$ is not sufficient: Obvious.

    $f$ is not injective: $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, now $\Omega$ becomes $\Omega \times \Omega'$ and $\mathcal{X}$ becomes $\mathcal{X} \times \mathcal{X'}$. The domain of $f$ expands but the codomain does not change.

  • $T$ is complete: Completeness tells about the family of distributions of $T$, it’s not related to $X$. So $T$ is still complete even we receive more samples.

    $f$ is surjective: The codomain and the image of $f$ does not change.

$\endgroup$
9
  • 1
    $\begingroup$ I think your proof for "Injective ⟹ sufficient" is incorrect. I have given an example here where it is clearly not true. It's surprising (and annoying) that wikipedia has that incorrect claim. I do believe the converse to be true, ie "sufficient ⟹ Injective", although I don't know why we need to assume that we have an identifiable model space? $\endgroup$
    – Shreyans
    Commented May 21 at 21:06
  • $\begingroup$ Hi, I think your example stands. What breaks my proof is the discrete distribution. My proof needs all probability densities to be regular numbers, not $0$ or $\infty$. @Shreyans $\endgroup$
    – Y.D.X.
    Commented May 22 at 4:50
  • $\begingroup$ As for identifiability, I agree with you. We only mention $\Theta$ and $P_\theta$ here, and we don't mind if multiple $\theta$'s refer to the same $P_\theta$. To be honest, I didn't think of it last year. I thought it was just a necessary but useless convention to be rigorous. $\endgroup$
    – Y.D.X.
    Commented May 22 at 4:53
  • $\begingroup$ I don’t think the issue is discreteness. I don’t completely understand your proof so it’s difficult for me to point to the incorrect step. can you maybe explain the steps one by one? to me it seems like we assume Pθ1=Pθ2, then show the definition of sufficient holds. But we need to show that the definition holds for all θ, even when Pθ1!=Pθ2 $\endgroup$
    – Shreyans
    Commented May 22 at 6:27
  • $\begingroup$ Also “necessary but useless” is an oxymoron :D $\endgroup$
    – Shreyans
    Commented May 22 at 6:29
1
$\begingroup$

A complete statistic T is one for which any proposed distribution on the domain of T is predicted by one or more prior distributions on the model parameter space.

I doubt whether that statement is true. Say we have as sample $X_1, \dots, X_n$ with

$$X_i \sim N(\mu, 1)$$

then the sample mean is a sufficient and complete statistic and is distributed as

$$\bar{X}|\mu \sim N(\mu, 1/\sqrt{n})$$

The distribution of the sample mean conditional on a prior distribution for the parameter $\mu$ will be a convolution which is similar to Gaussian smoothening and the variance will be at least $1/\sqrt{n}$.

This means that not every distribution for $\bar{X}$ can be mapped backwards to a prior on $\mu$. And the mapping in the question is not surjective for this example. Yet, the statistic is a complete statistic.


On mathoverflow there's an explanation of completeness that get's close to it

https://mathoverflow.net/a/182661

Geometrically, completeness means something like this: if a vector $g(T)$ is orthogonal to the p.d.f. $f_\theta$ of $T$ for each $\theta$, $$\mathbb E_\theta g(T) = \langle g(T),f_\theta\rangle=0$$ then $g(T)=0$ i.e., the functions $f_\theta$ for varying $\theta$ span the whole space of functions of $T$.

So the space of linear combinations of pdf's $f_{\theta}(T)$ of the statiatic is complete (it contains any function $g(T)$). Any function $g(T)$ can be described as an integral

$$g(T) = \int h(\theta) f_{\theta}(T)\, \text{d}\theta$$

but here $h(\theta)$ is not a distribution on $\theta$; it doesn't need to integrate to 1, and it can have negative values.

$\endgroup$
2
  • $\begingroup$ I agree with you (see “$T$ is complete $\iff$ $f$ is surjective” in my answer), and I think the surjective here should not be the normal meaning — we let them span here. I am wondering if we restrict the codomain to all functions that integrates to $1$, then can $f$ be surjective in the normal sense? $\endgroup$
    – Y.D.X.
    Commented Oct 30, 2023 at 2:13
  • $\begingroup$ I guess my hypothesis is still wrong. As you said, the variance will be at least $1/n$. $\endgroup$
    – Y.D.X.
    Commented Oct 30, 2023 at 2:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.