Skip to main content

A sufficient statistic is a lower dimensional function of the data which contains all relevant information about a certain parameter in itself.

Consider $n$ independent observations from a random variable $X: X_1, X_2,...,X_n$ such that $f(x_1,...,x_n|\theta)=\prod_{i=1}^{n}p(x_{i}|\theta)$ where $\theta$ is a parameter to be estimated. A sufficient statistic reduces the whole data into a function of $x$ that carries all relevant information about $\theta$.

As an example let $X \sim \operatorname{Exponential}(\lambda)$ and suppose $x_i > 0$, then the joint probability $$f(x_1,...,x_n|\lambda) = \prod_{i=1}^{n} \lambda e^{-\lambda x_{i}}$$ can be written as $$f(x_1,...,x_n|\lambda) = \lambda^{n} e^{-\lambda \sum x_{i}}$$ reducing the data to the sufficient statistic $\sum x_{i}$. Knowing the sample average and the number of observations then allows us to calculate an estimate of $\lambda$ without considering the whole data vector $x_i$.