
I can't understand the paragraph in Completeness (statistics) - Wikipedia:

We have an identifiable model space parameterised by $\theta$, and a statistic $T$. Then consider the map $f:p_{\theta }\mapsto p_{T|\theta }$ which takes each distribution on model parameter $\theta$ to its induced distribution on statistic $T$. The statistic $T$ is said to be complete when $f$ is surjective, and sufficient when $f$ is injective.

(added on 2023-07-12 without any citation, and there's no revision after that)

  • What does “distribution on $\theta$” mean?

    Domain of $f$ is all prior distribution of $\theta$ (in Bayesian sense), or the famliy of distributions of samples ($X_1, \ldots, X_n$)?

  • What is the codomain of $f$?

    I guess the image of $f$ is all possible distribution of $T$ (i.e. the famliy of distributions of $T$), but codomain should be larger than that, or $f$ is always surjective.

My thoughts

Sufficiency and completeness are related but independent concepts, as discussed in the following questions. If the statement in Wikipedia is true, then it’s a clear explanation of sufficiency and completeness.

Roughly speaking:

  • $T$ is sufficient: $T$ provides all information of $\theta$ from $X$, and we can recover the whole distribution of $X$ once given $T$.

    $f$ is injective: If we know that $y$ is $f$ of some $x$, then there is exactly one $x$ that satisfies $y = f(x)$.

  • $T$ is complete: For any function $g$, whenever $\operatorname{\mathbb{E}} g(T) \equiv 0$, then $\Pr(g(T) = 0) \equiv 1$, where “$\equiv$” means $\forall \theta$.

    $f$ is surjective: For any two functions $g, g'$, whenever $g \circ f = g \circ f$, then $g = g'$.

I can feel that these concepts are connected, but I can’t make it rigorous…

Alternative comprehension of sufficiency and completeness

Basic intuition about minimal sufficient statistic - Cross Validated: $T$ can be seen as an indexed partition of the sample space.

st.statistics - Is a function of complete statistics again complete? - MathOverflow: $\operatorname{\mathbb{E}} g(T) \equiv 0$ means the distributions of $T$ for varying $\theta$ span the whole space of functions of $T$.

We need to clarify the statistical model $(\Omega,\mathcal{P})$, where $\Omega$ is the sample space, and $\mathcal{P} = \{P_\theta: \theta \in \Theta \}$ is the set of probability distributions on $\Omega$, and $\Theta$ is the parameter space. Those $\Omega, P_\theta$ are those in the probability triple $(\Omega, \mathcal{F}, P_\theta)$.

  • $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, where $X$ is called a random variable, $T$ is called a statistic.
  • $\mathcal{F} \overset{P_\theta}{\to} \mathbb{R}$.

We say that the model is identifiable if the mapping $\theta \mapsto P_\theta$ is injective.

What does “distribution on $\theta$” mean?

The domain of $f$ is neither prior distributions, nor the family of distributions of $X$. It is $\mathcal{P}$.

What is the codomain of $f$?

All $\mathcal{T} \to \mathbb{R}$ functions. This is different from and larger than the family of induced distributions of $T$, because the latter is $\{P_{T|\theta}: \theta \in \Theta\}$ where $P_{T|\theta}(t) = P_\theta(X^{-1}(T^{-1} (t)))$, and the former includes functions that cannot be represented by $P_\theta$.

$T$ is sufficient $\iff$ $f$ is injective

My rough thoughts are right here.

  • $\implies$

    If $T$ is sufficient, then $P_{X|T,\Theta}(x, t;\theta)$ does not depends on $\theta$. That is to say, $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}, \forall \theta_1, \theta_2 \in \Theta$, $P_{X|T,\Theta}(x,t;\theta_1) = P_{X|T,\Theta}(x,t;\theta_2)$.

    Now let’s prove $f$ is injective.

    For any $P_{T|\theta_1}$ and $P_{T|\theta_2}$ in the image of $f$, if $P_{T|\theta_1} = P_{T|\theta_2}$ (i.e. $\forall t \in \mathcal{T}, P_{T|\theta_1}(t) = P_{T|\theta_2}(t)$), then $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}$, $$ P_{\theta_1}(x) = P_{T|\theta_1}(t) \times P_{X|T,\Theta}(x,t;\theta_1) = P_{T|\theta_2}(t) \times P_{X|T,\Theta}(x,t;\theta_2) = P_{\theta_2}(x). $$ This is exactly the definition of $P_{\theta_1} = P_{\theta_2}$.

  • $\impliedby$

    Similarly, for any $P_{T|\theta_1}, P_{T|\theta_2}$ in the image of $f$, then $P_{\theta_1} = P_{\theta_2}$. By the definition of conditional distribution, $$ P_{X|T,\Theta}(x,t;\theta) = \frac{P_{\theta}(X =x \land T(x) = t)}{P_{T|\theta}(t)} = \frac{P_{\theta}(x)}{P_{T|\theta}(t)}. $$ Therefore $P_{X|T,\Theta}$ is same for $\theta_1$ and $\theta_2$.

$T$ is complete $\iff$ f is surjective

It looks like that this has nothing to do with cancelability. Instead it’s related to the geometric intuition:

st.statistics - Is a function of complete statistics again complete? - MathOverflow: $\operatorname{\mathbb{E}} g(T) \equiv 0$ means the distributions of $T$ for varying $\theta$ span the whole space of functions of $T$.

  1. $\operatorname{\mathbb{E}} g(T)$ is an inner product of $P_{T|\theta}$ and $g$.
  2. $\operatorname{\mathbb{E}} g(T) \equiv 0$ means $g \perp \{P_{T|\theta}: \theta \in \Theta\}$.
  3. Moreover, if the only perpendicular $g$ is $0$ (almost surely), then $\{P_{T|\theta}: \theta \in \Theta \}$ spans the whole $\mathcal{T} \to \mathbb{R}$ functions. This is the meaning of “$f$ is surjective”. (It’s not the normal meaning — we let them span here.)


Sufficient but not complete

$\Theta = \mathbb{R}$, $X \sim \mathcal{U}(\theta, \theta+2\pi)$ and $T = X$.

  • $T$ is sufficient: $T$ gives the same information of $\theta$ as $X$.

    $f$ is injective: It’s the identity map!

  • $T$ is not complete: $\operatorname{\mathbb{E}} \sin X \equiv 0$ no matter how $\sin X$ is distributed.

    $f$ is not surjective:

    • $\mathcal{T} = \mathbb{R}$ so codomain of $f$ is all functions on $\mathbb{R}$.

    • The image of $f$ is $\{\boldsymbol{1}_{[\theta, \theta+2\pi]}: \theta \in \mathbb{R} \}$, where $\boldsymbol{1}$ is the indicator function.

      Its intersection to the set of $2\pi$-periodic functions is the set of constant functions. In other words, the image of $f$ cannot span functions with period $2\pi, 4\pi, 6\pi$ and so on.

      Alternatively, the Fourier transform of $\boldsymbol{1}_{[\theta, \theta+2\pi]}$ is $2\pi \operatorname{sinc}(\pi \omega) e^{-i(\theta-\pi)\omega}$, which has zeros at $\omega = 1,2,\ldots$.

Constant statistic

  • $T$ is not sufficient: Obvious.

    $f$ is not injective: $f$ always maps to the same singleton distribution.

  • $T$ is complete: $g(T)$ is deterministic. It has to be $0$ if the expectation is $0$.

    $f$ is surjective: $\mathcal{T}$ is a one-point set, so any $\mathcal{T} \to \mathbb{R}$ function is a basis of codomain of $f$.

Ignore some samples

First we work out a complete and sufficient statistic for $n$ samples. Now we're given more samples but we stick to the old statistic.

  • $T$ is not sufficient: Obvious.

    $f$ is not injective: $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, now $\Omega$ becomes $\Omega \times \Omega'$ and $\mathcal{X}$ becomes $\mathcal{X} \times \mathcal{X'}$. The domain of $f$ expands but the codomain does not change.

  • $T$ is complete: Completeness tells about the family of distributions of $T$, it’s not related to $X$. So $T$ is still complete even we receive more samples.

    $f$ is surjective: The codomain and the image of $f$ does not change.

A complete statistic T is one for which any proposed distribution on the domain of T is predicted by one or more prior distributions on the model parameter space.

I doubt whether that statement is true. Say we have as sample $X_1, \dots, X_n$ with

$$X_i \sim N(\mu, 1)$$

then the sample mean is a sufficient and complete statistic and is distributed as

$$\bar{X}|\mu \sim N(\mu, 1/\sqrt{n})$$

The distribution of the sample mean conditional on a prior distribution for the parameter $\mu$ will be a convolution which is similar to Gaussian smoothening and the variance will be at least $1/\sqrt{n}$.

This means that not every distribution for $\bar{X}$ can be mapped backwards to a prior on $\mu$. And the mapping in the question is not surjective for this example. Yet, the statistic is a complete statistic.

On mathoverflow there's an explanation of completeness that get's close to it


Geometrically, completeness means something like this: if a vector $g(T)$ is orthogonal to the p.d.f. $f_\theta$ of $T$ for each $\theta$, $$\mathbb E_\theta g(T) = \langle g(T),f_\theta\rangle=0$$ then $g(T)=0$ i.e., the functions $f_\theta$ for varying $\theta$ span the whole space of functions of $T$.

So the space of linear combinations of pdf's $f_{\theta}(T)$ of the statiatic is complete (it contains any function $g(T)$). Any function $g(T)$ can be described as an integral

$$g(T) = \int h(\theta) f_{\theta}(T)\, \text{d}\theta$$

but here $h(\theta)$ is not a distribution on $\theta$; it doesn't need to integrate to 1, and it can have negative values.

