3
$\begingroup$

I understand the heuristic definition: say you know a statistic, $T$, of some sample that you want to use to estimate the corresponding population parameter - but you don't know the data points of the sample themselves. 'We say $T$ is a sufficient statistic if the statistician who knows the value of $T$ can do just as good a job of estimating the unknown parameter $\theta$ as the statistician who knows the entire random sample' - that's the definition of a sufficient statistic I've read online, and understand.

But then comes the factorisation theorem, which I'm struggling with: A statistic $T$ is a sufficient one for a sample $\boldsymbol{X} = (X_1,X_2,\ldots,X_n)$ if $f(\boldsymbol{X} \mid\theta)$, the conditional pdf for $\boldsymbol{X} $ given the parameter $\theta$ and stat. $T$, does not depend on $\theta$. This is equivalent to factorising $f(\boldsymbol{X} \mid\theta)$ into two functions:

$$f(\boldsymbol{X} \mid\theta) = h(X_1,X_2,\ldots,X_n) \cdot g(T(X_1,X_2,\ldots,X_n),\theta).$$

$T$ would then be a sufficient statistic, as the conditional probability $f(\boldsymbol{X} \mid\theta)$ now does not depend on $\theta$. But here's my question - how can the new factorised $f(\boldsymbol{X}\mid\theta)$ not depend on $\theta$ when $\theta$ is still in the final equation? In the examples I've seen, the final equations still have $\theta$ in them, as well as the statistic as some function of $X_1,X_2,\ldots,X_n$ - so how can the conditional probability depend on $T$ alone?

If $T$ is supposed to be all you need to know to know the conditional distribution, how can $\theta$ be a variable in the equation that you need the data of? I think I've gone wrong in some basic understanding of what's supposed to be going on here, so apologies if this is elementary.

$\endgroup$
1
  • $\begingroup$ I have edited your question to put the maths in Latex form and reduce the length a bit. I also note that you were regularly referring to sufficient statistics as 'satisfactory statistics' which is not the correct terminology. I have corrected that also, but please note the correct term. Please check to see that my changes are consistent with the intention of your question. $\endgroup$
    – Ben
    Commented Jul 3, 2018 at 6:06

2 Answers 2

5
$\begingroup$

Perhaps a specific example [similar to one in Bain & Englehardt, 2e (1992); Example 10.2.1, p338] will help by showing the required functional independence.

Let data $\mathbf{X} = (x_1, \dots, x_n)$ be a random sample from $\mathsf{Exp}(\lambda),$ an exponential distribution with rate $\lambda;$ and let $t = \sum_i x_i.$ We wish to show that $t$ is sufficient for $\lambda.$

First, the joint density function is

$$f_{\mathbf{X};\lambda}(x_1, \dots, x_n;\lambda) = \lambda^ne^{-\lambda t},\; \text{for}\; x_i > 0.$$

Also, one can show using moment generating functions that $t \sim \mathsf{Gamma}(n, \lambda),$ so that $$f_{t;\lambda}(t;\lambda) = \frac{\lambda^n}{\Gamma(n)}t^{n-1}e^{-\lambda t},\; \text{for}\; t > 0.$$

Thus

$$f_{\mathbf{X}|t}(x_i,\dots,x_n|t) = \frac{f_{\mathbf{X};\lambda}(x_1, \dots, x_n;\lambda)}{f_{t;\lambda}(t;\lambda)} = \frac{\Gamma(n)}{t^{n-1}},$$ which is functionally independent of the parameter $\lambda,$ so that the statistic $t$ is sufficient for $\lambda.$


Note: The simulation below illustrates (for $n=5$ and $\lambda = 3)$ that $\hat \lambda = \frac{n-1}{t}$ is an unbiased estimator of $\lambda$ and that $t \sim \mathsf{Gamma(n, \lambda)}.$

set.seed(1884);  m = 10^6;  n=5;  lam = 3
t = replicate( m, sum(rexp(n, lam)) )
mean((n-1)/t)
[1] 2.999782  # aprx E(4/t) = 3

hist(t, prob=T, col="skyblue2", br=30, main="Simulated Total with GAMMA(5, 3) Density")
  curve(dgamma(x,n,lam), add=T, lwd=2)

enter image description here

$\endgroup$
4
$\begingroup$

You have slightly misunderstood sufficiency and the factorisation theorem here. You can see from the form of the factorisation theorem that the conditional density $f(\boldsymbol{X}|\theta)$ does depend on $\theta$. (You are right - because the value $\theta$ is in the equation, this density does indeed depend on $\theta$.) However, if the factorisation in the factorisation theorem holds, then it can be shown that:

$$f(\boldsymbol{X}|T(\boldsymbol{X}), \theta) = f(\boldsymbol{X}|T(\boldsymbol{X})) = \text{Function depending on }T \text{ but not }\theta,$$

and thus, the conditional density $f(\boldsymbol{X}|T(\boldsymbol{X}), \theta)$ does not depend on $\theta$. That latter property is what is required for sufficiency of $T$. The factorisation theorem just says that if the density $f(\boldsymbol{X}|\theta)$ has a certain form, then the required condition for sufficiency will emerge from this.

Remember that sufficiency means that if you already know the sufficient statistic (i.e., when you conditional on $T$), then the parameter has no further influence on the density of the observed data. If you don't condition on the sufficient statistic then nothing happens - the data is still dependent on the parameter of the underlying distribution.

$\endgroup$
4
  • $\begingroup$ Thanks very much for your reply! Though, unfortunately it largely still feels over my head. May I ask some follow up questions? Originally, my big hang up was that I was struggling to see intuitively how the factorisation theorem implied the 'heuristic definition' of a suff. stat. I was looking at the fact. theorem, and couldn't see it related to the heuristic definition. But, am I correct in now thinking, from what you're saying, that the fact. theroem isn't the maths version of the heurstic definition - i.e. it doesn't show why, for a given sat. stat, the heurstic def holds ... $\endgroup$ Commented Jul 5, 2018 at 11:29
  • $\begingroup$ it just simply says if this factorisation is possible, then T(x) is indeed a sufficient stat? Is that what you're saying? That would explain why I was struggling to see the intuition behind why the fact. theorem and definition for sufficient stat are linked - becasue that intuition isn't there. Further then, is your maths explanation an explanation as to why a suff stat being a suff stat implies what the definition implies? Suppose it is for a sec, I'm still not quite sure how your explanation implies the definition (though it feels more right)........ $\endgroup$ Commented Jul 5, 2018 at 11:37
  • $\begingroup$ Tell me, from what you've said, if this is then right: If T(X) is a sufficient statistic, then: f(X|θ) = f(X|T(X),θ)=f(X|T(X)) So, if say you had some sample data, for a distribution you know to be binomial but you didn't know the parameter θ, and you calculated f(X|θ) for varying θ - then, if T(X) is sufficient for θ, and you calculated f(X|T(X)) and varied T(X) across the same values for θ, then it would be that: f(X|θ) =f(X|T(X)) . Because f(X|θ) = f(X|T(X),θ)=f(X|T(X)), and that's what it means for T(X) to be sufficient? Is that right? Many thanks for all your help, either way! $\endgroup$ Commented Jul 5, 2018 at 11:44
  • $\begingroup$ I cannot quite make sense of what you are asking. But in the binomial case, a sufficient statistic for $\theta$ is the sample proportion $\bar{x}_n/n$. So, if you already know the sample proportion, then (conditional on this) the distribution of the values no longer depends on the parameter $\theta$. $\endgroup$
    – Ben
    Commented Nov 15, 2019 at 10:23

Not the answer you're looking for? Browse other questions tagged or ask your own question.