We need to clarify the statistical model $(\Omega,\mathcal{P})$, where $\Omega$ is the sample space, and $\mathcal{P} = \{P_\theta: \theta \in \Theta \}$ is the set of probability distributions on $\Omega$, and $\Theta$ is the parameter space. Those $\Omega, P_\theta$ are those in the probability triple $(\Omega, \mathcal{F}, P_\theta)$.
- $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, where $X$ is called a random variable, $T$ is called a statistic.
- $\mathcal{F} \overset{P_\theta}{\to} \mathbb{R}$.
We say that the model is identifiable if the mapping $\theta \mapsto P_\theta$ is injective.
What does “distribution on $\theta$” mean?
The domain of $f$ is neither prior distributions, nor the family of distributions of $X$. It is $\mathcal{P}$.
What is the codomain of $f$?
All $\mathcal{T} \to \mathbb{R}$ functions. This is different from and larger than the family of induced distributions of $T$, because the latter is $\{P_{T|\theta}: \theta \in \Theta\}$ where $P_{T|\theta}(t) = P_\theta(X^{-1}(T^{-1} (t)))$, and the former includes functions that cannot be represented by $P_\theta$.
$T$ is sufficient $\iff$ $f$ is injective
My rough thoughts are right here.
$\implies$
If $T$ is sufficient, then $P_{X|T,\Theta}(x, t;\theta)$ does not depends on $\theta$. That is to say, $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}, \forall \theta_1, \theta_2 \in \Theta$, $P_{X|T,\Theta}(x,t;\theta_1) = P_{X|T,\Theta}(x,t;\theta_2)$.
Now let’s prove $f$ is injective.
For any $P_{T|\theta_1}$ and $P_{T|\theta_2}$ in the image of $f$, if $P_{T|\theta_1} = P_{T|\theta_2}$ (i.e. $\forall t \in \mathcal{T}, P_{T|\theta_1}(t) = P_{T|\theta_2}(t)$), then $\forall x \in \mathcal{X}, \forall t \in \mathcal{T}$,
$$
P_{\theta_1}(x)
= P_{T|\theta_1}(t) \times P_{X|T,\Theta}(x,t;\theta_1)
= P_{T|\theta_2}(t) \times P_{X|T,\Theta}(x,t;\theta_2)
= P_{\theta_2}(x).
$$
This is exactly the definition of $P_{\theta_1} = P_{\theta_2}$.
$\impliedby$
Similarly, for any $P_{T|\theta_1}, P_{T|\theta_2}$ in the image of $f$, then $P_{\theta_1} = P_{\theta_2}$. By the definition of conditional distribution,
$$
P_{X|T,\Theta}(x,t;\theta)
= \frac{P_{\theta}(X =x \land T(x) = t)}{P_{T|\theta}(t)}
= \frac{P_{\theta}(x)}{P_{T|\theta}(t)}.
$$
Therefore $P_{X|T,\Theta}$ is same for $\theta_1$ and $\theta_2$.
$T$ is complete $\iff$ f is surjective
It looks like that this has nothing to do with cancelability. Instead it’s related to the geometric intuition:
st.statistics - Is a function of complete statistics again complete? - MathOverflow: $\operatorname{\mathbb{E}} g(T) \equiv 0$ means the distributions of $T$ for varying $\theta$ span the whole space of functions of $T$.
- $\operatorname{\mathbb{E}} g(T)$ is an inner product of $P_{T|\theta}$ and $g$.
- $\operatorname{\mathbb{E}} g(T) \equiv 0$ means $g \perp \{P_{T|\theta}: \theta \in \Theta\}$.
- Moreover, if the only perpendicular $g$ is $0$ (almost surely), then $\{P_{T|\theta}: \theta \in \Theta \}$ spans the whole $\mathcal{T} \to \mathbb{R}$ functions. This is the meaning of “$f$ is surjective”. (It’s not the normal meaning — we let them span here.)
Examples
Sufficient but not complete
$\Theta = \mathbb{R}$, $X \sim \mathcal{U}(\theta, \theta+2\pi)$ and $T = X$.
$T$ is sufficient: $T$ gives the same information of $\theta$ as $X$.
$f$ is injective: It’s the identity map!
$T$ is not complete: $\operatorname{\mathbb{E}} \sin X \equiv 0$ no matter how $\sin X$ is distributed.
$f$ is not surjective:
$\mathcal{T} = \mathbb{R}$ so codomain of $f$ is all functions on $\mathbb{R}$.
The image of $f$ is $\{\boldsymbol{1}_{[\theta, \theta+2\pi]}: \theta \in \mathbb{R} \}$, where $\boldsymbol{1}$ is the indicator function.
Its intersection to the set of $2\pi$-periodic functions is the set of constant functions. In other words, the image of $f$ cannot span functions with period $2\pi, 4\pi, 6\pi$ and so on.
Alternatively, the Fourier transform of $\boldsymbol{1}_{[\theta, \theta+2\pi]}$ is $2\pi \operatorname{sinc}(\pi \omega) e^{-i(\theta-\pi)\omega}$, which has zeros at $\omega = 1,2,\ldots$.
Constant statistic
$T$ is not sufficient: Obvious.
$f$ is not injective: $f$ always maps to the same singleton distribution.
$T$ is complete: $g(T)$ is deterministic. It has to be $0$ if the expectation is $0$.
$f$ is surjective: $\mathcal{T}$ is a one-point set, so any $\mathcal{T} \to \mathbb{R}$ function is a basis of codomain of $f$.
Ignore some samples
First we work out a complete and sufficient statistic for $n$ samples. Now we're given more samples but we stick to the old statistic.
$T$ is not sufficient: Obvious.
$f$ is not injective: $\Omega \overset{X}{\to} \mathcal{X} \overset{T}{\to} \mathcal{T}$, now $\Omega$ becomes $\Omega \times \Omega'$ and $\mathcal{X}$ becomes $\mathcal{X} \times \mathcal{X'}$. The domain of $f$ expands but the codomain does not change.
$T$ is complete: Completeness tells about the family of distributions of $T$, it’s not related to $X$. So $T$ is still complete even we receive more samples.
$f$ is surjective: The codomain and the image of $f$ does not change.