Idempotence and the Rao–Blackwell theorem

Question

Original question:

In the Wikipedia article on the Rao–Blackwell theorem, we read:

In case the sufficient statistic is also a complete statistic, i.e., one which "admits no unbiased estimator of zero", the Rao–Blackwell process is idempotent.

and indeed this is commonly heard and quite easy to prove (just apply definitions).

Suppose the hypothesis of completeness were dropped. What would be an example in which idempotence would not hold?

Later edit for the sake of self-containment:

Consider a parametrized family of probability distributions of a random variable $X$, with parameter $\theta$. In typically seen examples $X$ is a sequence $X_1,\ldots,X_n$ of independent identically distributed random variables and $\theta$ is a tuple of one or two scalars. $\theta$ may be $(\mu,\sigma)$, the expectation and variance of a normal distribution, or may be $p$, the expected value of a Bernoulli-distributed random variable (thus e.g. the proportion of the population who have blue eyes), or may be the shape and scale parameters of a gamma distribution, etc.

A statistic is an "observable" random variable, i.e. its value depends on $(X,\theta)$ only through $X$. E.g. heights of 21-year-old men may be normally distributed with expectation $\mu$ and standard deviation $\sigma$; we observe the heights of $20$ such men, randomly chosen. The average of the $20$ heights is observable; the amount by which that average exceeds the population average $\mu$ is unobservable. Often unobservable things are those that depend on the values of parameters represented by Greek letters.

A sufficient statistic is a statistic $T(X)$ such that the conditional distribution of $X$ given $T(X)$ does not depend on $\theta$.

For example:

If $X_1,\ldots,X_n\sim \operatorname{i.i.d.} N(\mu,\sigma^2)$, it can be shown that the conditional distribution of $X_1,\ldots,X_n$ given the pair $(X_1+\cdots+X_n,X_1^2+\cdots+X_n^2)$ does not depend on $(\mu,\sigma^2)$. Thus that pair of random variables is a sufficient statistic for that family of probability distributions.
If $X_1,\ldots,X_n\sim\operatorname{i.i.d.} \operatorname{Uniform}(0,\theta)$, then the conditional distribution of $X_1,\ldots,X_n$ given $\max\{X_1,\ldots,X_n\}$ does not depend on $\theta$. Thus $\max\{X_1,\ldots,X_n\}$ is a sufficient statistic for that family of distributions.
If $X_1,\ldots,X_n\sim\operatorname{i.i.d.}\operatorname{Gamma}(\alpha,\beta)$ (i.e. with density proportional to $x\mapsto x^{\alpha-1}e^{-x/\beta}$ for $x > 0$), then the conditional distribution of $X_1,\ldots,X_n$ given $(X_1+\cdots+X_n,\ X_1\cdots X_n)$ does not depend on $(\alpha,\beta)$. Thus the pair whose components are the sum and the product is a sufficient statistic for that family of distributions.
If $X$ has density proportional to $x\mapsto e^{-|x-\theta|}$ for $x\in\mathbb{R}$, and the parameter space is given by $\theta\in\{1, -1\}$, then $$\begin{cases} 1 & \text{if }X\ge 1 \\ -1 & \text{if }X\le -1 \\ X & \text{if }-1<X<1 \end{cases}$$ is a sufficient statistic.

Intuitively, a sufficient statistic constains all information in the data that is relevant to estimating the value of $\theta$, if the model, which is the parametrized family of disetributions, is right. But a sufficient statistic contains none of the information in the data that might indicate that the model is wrong (unless what's wrong is the support of the distribution; e.g. you get a negative number when you thought the Gamma family was right).

A complete statistic is a statistic $S(X)$ (i.e. and observable random variable) that admits no unbiased estimator of $0$. An unbiased estimator of $0$ is a function $f(S(X))$, where $f$ is not allowed to depend on $\theta$ whose expected value remains equal to $0$ regardless of the value of $\theta$. Thus in the first example above, $(X_1,X_2)$ is a statistic, but is not complete, since the expectation of $X_1-X_2$ is $0$ regardless of the values of $\mu$ and $\sigma^2$. The sum $X_1+\cdots+X_n$ is a complete statistic in that case.

An unbiased estimator of $g(\theta)$ is a statistic (an observable random variable) whose expected value remains equal to $g(\theta)$ regardless of the value of $\theta$. (The emphasis on "observable" is a sort of warning against a newbie mistake that says a biased estimator can be changed to an unbiased estimator by subtracting the bias from it. That doesn't work since the bias is unobservable in all cases where one would need to seek an estimator. The resulting random variable is not a statistic; it cannot be used as an estimator.)

The Rao–Blackwell theorem says that if $h(X)$ is an unbiased estimator of $\theta$ and $T(X)$ is a sufficient statistic for $\theta$ (i.e. for the family of distributions parametrized by $\theta$) then $j(X)=\mathbb{E}(h(X) \mid T(X))$ is an unbiased estimator that is no worse, in the sense of mean-square-error or any of a variety of similar function, than is $h(X)$. In typical applications, $j$ is far better than $h$, and that's why Rao–Blackwell is used.

For example: suppose one wants a good unbiased estimator of the probability that no phone call comes into the switchboard in the next two minutes, when their arrival is a Poisson process with a rate of $\lambda\text{ per minute}$. The parameter $\lambda$ is unobservable. One wants to estimate $e^{-2\lambda}$. The data consist of numbers of calls that arrived in each of $10$ disjoint two-minute intervals in the past $20$ minutes. A crude unbiased estimator is $1$ if no call arrived in the first two minutes and $0$ if one or more arrived. Exercise: Find the conditional expected value of that crude estimator given the data. You should get $\left((9/10)^\text{(number of calls in the past 20 minutes)}\right)$. That is the improved unbiased estimator.

You start with a crude estimator $h(X)$ and find a better estimator $j(X)$. What happens if you apply the process again, starting with $j(X)$?

If the sufficient statistic happens to be complete, you just get $j(X)$ again. That's idempotence. The proof of idempotence is just definition-chasing.

(+1) But, this question would benefit from an edit to make it more self-contained. — cardinal, Commented Apr 12, 2012 at 13:31
OK, I've added a terse account of the Rao–Blackwell theorem and the proposition about completeness and idempotence. It should be comprehensible to probabilists. — Michael Hardy, Commented Apr 12, 2012 at 17:45
Michael, that's way more ambitious of an edit than I was envisioning with my previous comment. :) — cardinal, Commented Apr 12, 2012 at 18:27
I do not see how completeness comes into the picture at all. Indeed, let $U = \mathbb E(S \mid T)$. Then $U$ is $\sigma(T)$-measurable and so $\mathbb E(U \mid T) = U$; that is, we have idempotence. Completeness, in my mind, comes into play in establishing uniqueness. Then, we know that $\mathbb f_1(T) := E(S_1 \mid T) = \mathbb E(S_2 \mid T) =: f_2(T)$ almost surely, since otherwise we'd have $\mathbb E_\theta(f_1(T) - f_2(T)) = 0$ for all $\theta$, and hence, a contradiction. — cardinal, Commented Apr 12, 2012 at 18:31
@cardinal : I think it's clear that you don't need completeness to get idempotence. — Michael Hardy, Commented Jun 3, 2012 at 15:51

Stack Exchange Network

Idempotence and the Rao–Blackwell theorem

0

You must log in to answer this question.

Browse other questions tagged
statistics
probability-theory
.

Hot Network Questions

Idempotence and the Rao–Blackwell theorem

0

You must log in to answer this question.

Browse other questions tagged statisticsprobability-theory.

Related

Hot Network Questions

Browse other questions tagged
statistics
probability-theory
.