6
$\begingroup$

I am interested in the Pitman-Koopman-Darmois theorem. I'm having a hard time finding a simple rigorous version of this theorem as I struggle finding sources.

This helpful post provides three sources for the theorem:

(There is also a reference from Don Fraser which I let out because it seems a bit controversial)

The last reference, by Darmois, does not provide any proof, but only a short informal statement of the theorem. I think the theorem is proved in some other publication of the author, but I couldn't find it online.

The first reference, by Pitman provides a proof but no clear statement of the result. Also I find it not rigorous enough, with loose notations, hypothesis not clearly stated, and some "... it is evident that ..." which I do not find evident at all.

The Koopman reference is clear and rigorous (the proof is only given in the particular case where the sufficient statistic is of dimension 2), but the statement of the theorem is a bit technical and only deals with continuous 1-dimensional real random variables.

Hence my question, is the Pitman-Koopman-Darmois also valid for discrete random variables ?

More generally is there any other reference stating and proving this theorem ?

$\endgroup$
4
  • 2
    $\begingroup$ Concerning Darmois, the Comptes Rendus de l'Académie des Sciences de Paris are briefs that report on major results, within 4 (in French) to 6 (in French and English) pages, the main publication being in a true journal with the necessary details and proofs. $\endgroup$
    – Xi'an
    Commented Apr 4 at 19:00
  • $\begingroup$ It's too bad that nothing points to the main publication.. This brief report ends up being the one cited by most articles on PKD theorem when it is not really relevant. $\endgroup$
    – Pohoua
    Commented Apr 5 at 7:51
  • $\begingroup$ I have looked on-line but found nothing available. Darmois' books are stored by the (national) French Library (BN) and thus accessible in person, but not through their website. $\endgroup$
    – Xi'an
    Commented Apr 5 at 10:44
  • $\begingroup$ Thanks for looking. $\endgroup$
    – Pohoua
    Commented Apr 5 at 20:09

3 Answers 3

5
$\begingroup$

If you look at Fraser's (1962) re-evaluation of the (F-)D-P-K theorem, he begins with the following

enter image description here

which clearly indicates that there is no such restriction on the support $\mathsf X$ of the random variable $X$ for the theorem to apply. The change is simply in the dominating measure.

$\endgroup$
4
  • $\begingroup$ Thanks for the source. What do you think about considering the likelihood function as a minimal sufficient statistic ? The definition of sufficiency involves conditionning on the sufficient statistic, which, I think, requires having a probability distribution for this sufficient statistic. Probability distirbutions on the set of functions isn't something trivial. Is the result still valid ? I read elsewhere that you didn't like this argument, but do you still think that the proof is valid ? $\endgroup$
    – Pohoua
    Commented Apr 5 at 7:49
  • 2
    $\begingroup$ Indeed, I do not like the argument, but the reasoning in the paper is completely valid, of course. (Don Fraser was one of the giants of classical mathematical statistics, hence the "of course"!) $\endgroup$
    – Xi'an
    Commented Apr 5 at 8:03
  • $\begingroup$ Thanks for the article which I finally read more attentively. The argument of Fraser was enlightning: I had never made the connexion between the exponential family and the space of log-likelihood functions being stable under addition. $\endgroup$
    – Pohoua
    Commented Apr 11 at 8:09
  • $\begingroup$ However, it dosn't look like a proof of the theorem DPK theorem (I don't think it intends to be), but rather like another way of seeing this result. There is no clear statement of the the theorem (with clear hypotheses), and there are some holes in the reasoning which are not obvious to fill (for me at least). In particular why, for a given model, if there exist an infinite family of linearly independent log-likelihoods, a sufficient statistic can't be of non-increasing dimension (it's false if sufficient statistics are not restricted to be continuous: n numbers can be compressed into 1). $\endgroup$
    – Pohoua
    Commented Apr 11 at 8:16
5
$\begingroup$

Here is a one-dimensional rendering of a 1964 version of the Pitman-Koopman-Darmois-Fisher theorem with a proof, as presented by Peter Bickel during Erich Lehmann's course in Berkeley, recovered from a then-graduate student's notes (who kindly sent them to me!). My own comments are within square brackets.

Theorem Let $\theta$ be a real parameter, $\theta\in\Theta\subset\mathbb R$, and let $p_\theta(\cdot)$ be the density of $n$ real i.i.d. $X_i$'s. Assume $(X_1,\ldots,X_n)$ admits a [real] sufficient statistic $T(X_1,\ldots,X_n)$ such that

  1. $\frac{\partial T}{\partial x_i}$ exists for all $i$'s and $x_i$'s
  2. the support $A=\{x; p_\theta(x)>0\}$ is the same open set for all $\theta$'s
  3. $\frac{\partial^2 T}{\partial x_i\partial\theta}$ exists for all $(\theta,x_i)$'s, is continuous in $(\theta,x)$, and different from zero on $A\times\Theta$.

Then there exist $Q(\cdot)$ and $T^*(\cdot)$ such that $$p_\theta(x)=C(\theta)\exp\{Q(\theta)\cdot T^*(x)\}h(x)$$ [when $x\in A$ and $\theta\in\Theta$].

Proof. By the factorisation theorem, $$p_\theta(x_1)\cdots p_\theta(x_n) = h_\theta(T(x_1,\ldots,x_n))\cdot g(x_1,\ldots,x_n)$$ Let $$A^*=\{(x_1,\ldots,x_n,\theta)|p_\theta(x_1)\cdots p_\theta(x_n)>0\}$$ which is open. Then $$\sum_{i=1}^n \log\, p_\theta(x_i)=\log\,h_\theta(T(x_1,\ldots,x_n))+\log\,g(x_1,\ldots,x_n)$$ and $$\frac{\partial^2}{\partial x_i\partial\theta} \sum_{i=1}^n \log\,p_\theta(x_i)=\frac{\partial^2}{\partial x_i\partial\theta} \,\log\,h_\theta(T(x_1,\ldots,x_n))=\frac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta} \cdot \frac{\partial T} {\partial x_i}$$ This reduces to $$\frac{\partial^2}{\partial x\partial\theta} \log\, p_\theta(x)\Big|_{x=x_i}= \frac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta} \cdot \frac{\partial T(x_1,\ldots,x_n)}{\partial x_i}$$ [Since the LHS only depends on $\theta$ and $x_i$, this implies that ${\partial T(x_1,\ldots,x_n)}\big/{\partial x_i}$ only depends on $x_i$, hence that $T(x_1,\ldots,x_n)$ is of the form $\sum_i \tilde T(x_i)+C$]

Set $\theta=\theta_0$ and define $$u(x)=\frac{\partial}{\partial\theta} \log\, p_\theta(x)\Big|_{\theta=\theta_0}$$ Then $$\frac{\text d u(x)}{\text dx}\Big|_{x=x_i}= \frac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta} \cdot \frac{\partial T(x_1,\ldots,x_n)}{\partial x_i}$$ and $$\sum_{i=1}^n u(x_i)=\frac{\partial\log\,h_\theta(T)}{\partial\theta}\Big|_{\theta=\theta_0}=f(T)$$ Note that $$\frac{\text d u(x)}{\text dx}\ne 0\quad\text{and}\quad \frac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta}\Big|_{\theta=\theta_0}\ne 0$$ Then $$\dfrac{\dfrac{\partial^2\log\,p_\theta(x)}{\partial x\partial\theta}\Big|_{x=x_i}}{\dfrac{\text d u(x)}{\text dx}\Big|_{x=x_i}}=\dfrac{\dfrac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta}}{\dfrac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta}\Big|_{\theta=\theta_0}}\tag{1}$$ We claim that the RHS of (1) is a function of $\theta$ only. The LHS is a function of both $\theta$ and $x_i$ that holds for all values of $x_i$. Suppose both sides of the equation depend on $x_i$ and fix $\theta$. Suppose $(x_1,\ldots,x_n)\ne(y_1,\ldots,y_n)$. Then the LHS is the same for $(x_1,y_2,\ldots,y_n)$ and $(x_1,\ldots,x_n)$ when considering $i=1$. But $(x_1,y_2,\ldots,y_n)$ and $(y_1,\ldots,y_n)$ also give the same LHS when considering $i=2$. The RHS is therefore independent of $(x_1,\ldots,x_n)$. Then \begin{align}\dfrac{\partial^2\log\,h_\theta(T)}{\partial T\partial\theta}&=\nu(\theta)f^\prime(T)\\ \log\,h_\theta(T)&=\nu^\star(\theta)f(T)+\gamma(\theta)+C \end{align} Thus $$p_\theta(x_1)\cdots p_\theta(x_n) = C^\star(\theta)\,\exp\left\{\nu^\star(\theta)\sum_{i=1}^n u(x_i)\right\}\cdot g(x_1,\ldots,x_n)$$ and \begin{align}\log\,g(x_1,\ldots,x_n) &= \sum_{i=1}^n \nu^\star(\theta)\sum_{i=1}^nu(x_i)- \sum_{i=1}^n \log\,p_\theta(x_i)\\ &:= \sum_{i=1}^n \log \tilde h(x_i)\end{align} Hence $$p_\theta(x_1)\cdots p_\theta(x_n) = C^\star(\theta)\,\exp\left\{\nu^\star(\theta)\sum_{i=1}^n u(x_i)\right\}\cdot \tilde h(x_1)\cdots \tilde h(x_n)$$ leading to $$p_\theta(x)= C(\theta)\,\exp\left\{\nu^\star(\theta)u(x)\right\}\cdot \tilde h(x)$$

$\endgroup$
2
  • $\begingroup$ Thank you very much for taking the time to write this proof! Here, we are talking about a continuous random variables, correct ? (we are taking derivatives with respect to $x$). I haven't fully absorbed the proof yet, but I don't see why we can claim that : "Since the LHS only depends on $\theta$ and $x_i$, this implies that $\partial T(x_i, ..., x_n) / \partial x_i$ only depends on $x_i$". It seems to me that we would need that $ \left[\partial^2 p_\theta (x)/\partial\theta\partial x\right]/\left[\partial^2\log h(\theta)(T)/\partial T\partial\theta\right]$ only depends on $x_i$. $\endgroup$
    – Pohoua
    Commented Apr 12 at 14:28
  • $\begingroup$ I see why now, if we use the fact that the cross derivative of $h(T)$ doesn't depend on $x$. Thanks for this proof. $\endgroup$
    – Pohoua
    Commented Apr 12 at 19:32
1
$\begingroup$

Continuous case

I managed to get a proof of the following statement of the Pitman-Koopman-Darmois theorem :

Let be $X_1, \ldots, X_n$ be $n$ i.i.d. real random variables from a distribution with density $f_\theta$ such that :

  • the support of $f_\theta$ does not depend on $\theta$
  • the function $x \mapsto f_\theta(x)$ is continuously differentiable for all $\theta$.

If there exist a continuous sufficient statistic taking values in $\mathbb{R}^p$ with $p < n$, then $f_\theta$ is of the form

$f_\theta(x) = g(x) \exp\left[\,\sum_{i=1}^n a_i(x)b_i(\theta) + > c(\theta)\,\right]$,

i.e. it belongs to the exponential family.

The proof is a bit long and combines the proof of Koopman and the one of Dynkin, (Selected papers of Dynkin, Necessary and sufficient statistic for a family of probability distributions, p400-401), I hope I'll get the time to post it soon.

The continuity condition of the statistic $T$ is crucial because there exist some non continuous 1-to-1 maps from $\mathbb{R}^n$ to $\mathbb{R}$. Such maps could thus compress all the information of the data into a single number, and thus make sufficients statistics with values in $\mathbb{R}$, whatever the distribution of the data.

Discrete case

For discrete random variables, I found a result for rank 1 exponential distibution, from Andersen 1970.

The continuity of $T$, which does not make sense for discrete values of the $X_i$, is replaced by another, less intuitive, condition.

Let be $X_1, \ldots, X_n$ be $n$ i.i.d. be discrete random variables from a distribution with probability mass function $p_\theta$ such that the support of $p_\theta$ does not depend on $\theta$ If there exist a sufficient statistic $T$ such that :

  • $T$ takes values in a completely orderd set $\mathcal{T}$ (i.e. $\forall t_1, t_2\in \mathcal{T}$, either $t_1 \leq t_2$, or $t_2 \leq t_1$, and if $t_1 \leq t_2$ and $t_2\leq t_1$, then $t_1 = t_2$).
  • for all $t_1, t_2, t_3$, possible values of $T$, such that $t_1 = T(x_1, \ldots, x_{j-1}, x_j, x_{j+1}, \ldots, x_n)$ and $t_2 = T(x_1, \ldots, x_{j-1}, x_j', x_{j+1}, \ldots, x_n)$. If $t_3$ is between $t_1$ and $t_2$, then there exist $x$ between $x_j$ and $x_j'$ such that $t_3 = T(x_1, \ldots, x_{j-1}, x, x_{j+1}, \ldots, x_n)$.

Then $p_\theta$ can be written of the form $p_\theta(x) = g(x) \exp\left[\,a(x)b(\theta) + > c(\theta)\,\right]$ (i.e. it belongs to the exponential family and has rank 1).

The paper also provide a (very cool) example of statistic which doesn't satisfy the regularity condition and which is sufficient for any distribution on $\mathbb{N}$:

$T(x_1, \ldots, x_n) = \sum_{i = 1}^n \frac{1}{1 + \pi x_i}$.

It is claimed (and proved for $n=2$) that if $T(x_1, \ldots, x_n) = T(x_1', \ldots, x_n')$, then $x_1, \ldots, x_n$ and $x_1', \ldots, x_n'$ are equal (up to a permutation), using the fact that the $x_i$ and $x_i'$ are integers. Such that this statistic $T$ contains as much information as the whole (unordered) data !

I don't know any result for larger rank exponential family. The regularity conditions on $T$ which are (in my mind) the discrete equivalent of $T$ is real-valued and continuous, should be modified somehow I guess.

$\endgroup$
1

Not the answer you're looking for? Browse other questions tagged or ask your own question.