3
$\begingroup$

The definition of sufficient statistic is as follows:

A statistic $T(X_1,...,X_n)$ is sufficient for parameter $\theta$ if the conditional distribution of $X_1,...,X_n$, given that $T=t$, does not depend on $\theta$ for any value of $t$.

I often see in resources that in the case we have iid $X_1,...,X_n \sim N(\mu, \sigma^2)$, $\bar{X}$ is a sufficient statistic for $\mu$ when $\sigma^2$ is known. $\sigma^2$ has to be known otherwise the Factorization theorem doesn't factorize properly.

However, regardless of whether $\sigma^2$ is known or unknown, the conditional distribution of $f(X_1,...,X_n; \bar{X}, \mu, \sigma^2)=f(X_1,...,X_n|\bar{X}, \sigma^2)$ (from multivariate normal theory). Even when $\sigma^2$ is unknown, the conditional distribution of the data $\textbf{X}$ given $\bar{X}$ is independent of $\mu$, suggesting that $\bar{X}$ is a sufficient statistic for $\mu$ .

In general, it seems that if I have density defined by a parameter vector $\theta^\top=(\theta_1^\top, \theta_2^\top)$, where $T_1$ is a sufficient statistic for $\theta_1$ when $\theta_2$ is known, then regardless of whether $\theta_2$ is known or unknown, $f(X_1,...,X_n; T_1, \theta_1, \theta_2)=f(X_1,...,X_n;T_1, \theta_2)$(Essential Statistical Inference pg. 59 Boos and Stefanski).

  1. Why does the Factorization Theorem fail to identify $\bar{X}$ as a sufficient statistic when $\sigma^2$ is unknown, even though it satisfies the sufficient statistic definition?
  2. How necessary is it to declare that a statistic is sufficient only when certain nuisance parameters are known, if the conditional distribution is still independent of the parameter of interest?
$\endgroup$
1
  • 1
    $\begingroup$ In some cases, technically, Yes. Of course, the answer depends on what parameters are considered nuisances. For normal data, if $\mu$ is known, the sufficient statistic for $\sigma^2,$ is $\frac 1 n \sum_{i=1}^n(X_i-\mu)^2.$ By contrast, if neither parameter is known, we use $\bar X = \frac 1 n\sum_{i=1}^n X_i$ for $\mu$ and $\frac{1}{n-1} \sum_{i=1}^n(X_i-\bar X)^2$ for $\sigma^2.$ Also, in the first case we can use just $\sum_{i=1}^n X_i^2$ and in the second case both $\sum_{i=1}^n X_i$ and $\sum_{i=1}^n X_i^2.$ $\endgroup$
    – BruceET
    Commented Feb 3, 2021 at 4:37

2 Answers 2

3
$\begingroup$

While, in the Normal $\mathcal N(\mu,\sigma^2)$, $\bar X$ brings the same (Fisher) information on $\mu$ than the entire sample, regardless of whether $\sigma$ is known or unknown, the Normal model also exhibits a distinction between known versus (nuisance) unknown parameters when considering $\mu$ as a nuisance parameter. When $\mu$ is known, $$s^2_\mu(\mathbf x)=\sum_{i=1}^n (x_i-\mu)^2$$ is a sufficient statistic, but because it depends on $\mu$, it does not remain a sufficient statistic when $\mu$ is unknown. In this case, even $$s^2 (\mathbf x)=\sum_{i=1}^n (x_i-\bar x_n)^2$$ is not sufficient and one need keep both $\bar x_n$ and $s^2$ to achieve sufficiency in $\sigma$ for all (unknown) values of $\mu$.

A wider issue with the question is whether or not this conditional sufficiency notion (or any alternative) has statistical relevance as estimators based on the conditional sufficient statistics (e.g., MLEs) will generally have their distribution depend on both the parameter of interest and the nuisance parameters with the later needed in the end to construct confidence regions, &tc. Hence requiring an estimator on their own. Fisher's definition of sufficiency was introduced as a means to summarise data with no information loss. It is unclear this is feasible with this generalisation.

enter image description here

Following Fraser (1956) and Rao (1965), Sprott (1975) developed some notions of (conditional and marginal) sufficiency in the presence of nuisance parameters. The notion is not standard in that alternative definitions lead to different solutions. See for instance Basu (1977) (fantastic!) discussion on the "logical nightmare" of defining partial sufficiency.

The great Kolmogorov himself (1942) proposed a definition of partial sufficiency such that $T$ is partially sufficient for $\theta$ iff the posterior (marginal) distribution on $\theta$ only depends on $T(x)$ for all prior distributions $\pi$ on $(\theta,\xi)$ when $\xi$ is the nuisance parameter. Unfortunately, as recalled by Ghosh (1988), "Hajek (1965) pointed out that if $T$ is partially sufficient for $\theta$ in the sense of Kolmogorov, then $T$ must be sufficient for the universal parameter $(\theta,\xi)$."

$\endgroup$
0
$\begingroup$
  1. Why does the Factorization Theorem fail to identify $\bar{X}$ as a sufficient statistic when $\sigma^2$ is unknown, even though it satisfies the sufficient statistic definition?

The likelihood for a sample of i.i.d. normal distributed variables can be factorized

$$\begin{array}{} f(X, \mu, \sigma) &=& \underbrace{(2 \pi \sigma^2)^{-n/2} \cdot \exp \left( -{\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 } \right)}_{\text{part that depends on $X$}} \cdot \underbrace{ \exp \left( {\frac{\mu}{\sigma^2} T(X) } \right) \cdot \exp \left( -{\frac{\mu^2}{2\sigma^2} } \right)}_{\text{part that depends on $T(X) = \sum_{i=1}^n x_i$}}\\ \end{array}$$

The part that depends on $T(x)$ is also dependent on $\mu$, but the remaining part that depends on $X$ is not dependent on $\mu$. So we have factorized the distribution function.


The problem is that the factorized part also contains unknown (nuisance) parameters which might make it difficult to compute a maximum likelihood estimate.

In the case of the normal distribution, it turns out ok because after taking the logarithm and the derivative the nuisance parameter factors out.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.