Distribution of sample variance of Cauchy distributed variables

Question

Assume $X_i,i\in\left\{1,...,n\right\}$ are i.i.d. standard Cauchy distributed random variables.

I know that $\bar{X}_n:=\frac{1}{n}\sum_{i=1}^n X_i$ is standard Cauchy distributed.

I would like to know the distribution of the sample variance $$ \frac{1}{n}\sum_{i=1}^n \left(X_i-\bar{X}_n\right)^2 .$$

_{My foreknowledge:}

_{I know that moments like $\mathbb{E}(X),\mathbb{V}(X)$ do not exist for Cauchy distributed $X$. I know that linear combinations of independent Cauchy random variables is Cauchy distributed as well.}

_{Weaker question:}

_{If nobody knows the exact distribution of the sample variance, it would be interesting if the distribution is independent of number of samples $n$? Like the distribution of the sample mean $\bar{X}_n$ does not depend on $n$ as it is always standard Cauchy for all $n$. In the Cauchy distribution Wikipedia article it says:}

_{Similarly, calculating the sample variance will result in values that grow larger as more observations are taken.}

_{but I think this statement is not correct, because they use a similar (in my opinion very bad) formulation for the sample mean:}

_{the sample mean will become increasingly variable as more observations are taken}

_{which is not a correct statement, as the distribution of the sample mean $\bar{X}_n$ does not depend on $n$.}

_{After reading the whole (in my opinion very badly written) paragraph}

_{Although the sample values $x_{i}$ will be concentrated about the central value $ x_{0}$, the sample mean will become increasingly variable as more observations are taken, because of the increased probability of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the observations themselves; i.e., the sample mean of a large sample is no better (or worse) an estimator of $x_{0}$ than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more observations are taken.}

_{I am really not sure what the author of this article wanted to express how the distribution of the sample variance depends on the number of samples $n$.}

_{Do you know more about the distribution of the sample variance of $n$ i.i.d Cauchy distributed random variables?}

If $X_1,\ldots,X_n$ are i.i.d. with a standard Cauchy distribution (standard = median $0$ and IQR $2$) then $\overline X_n = (X_1+\cdots+X_n)/n$ also has the same Cauchy distribution, i.e. median $0$ and IQR $2.$ That can be readily shown by using characteristic functions. — Michael Hardy, Commented Jan 2, 2020 at 18:55

pre-kidney · Accepted Answer · 2019-07-13 21:27:51Z

You are asking (among other things) how it can be that the sequence of means $(\overline{X}_n)_{n\geq 1}$ can become "increasingly variable" given that each element has the same distribution.

In fact, this statement is not a contradiction - it depends on what is precisely meant by the phrase "increasingly variable". The intuition is that a Cauchy random variable $X$ can take very huge values with a probability that decays slowly to zero. This is because $\mathbb P(X>t)\approx (\pi t)^{-1}$ as $t\to\infty$, so among $X_1,\ldots,X_n$ the probability that at least one is greater than some huge number $N$ grows like $n/N$. Say $N$ is a million for the sake of illustration. While you would be very surprised to see $X_1$ or $X_2$ be larger than $N$ (probabilities on the order of 1 in a million), you would expect to see values of size around a million among the outliers in $X_1,\ldots,X_{N}$. When we take the sample mean, it gets overly dominated by these outlier terms - the mean doesn't care if most of your numbers are tiny, just a few outliers taking values in the millions are enough to skew the entire sample mean.

Keep in mind what happened here: we started with a sequence of Cauchy random variables $(X_n)_{n\geq 1}$ and we obtained a new sequence $(\overline{X}_n)_{n\geq 1}$ of Cauchy random variables. But the two random sequences do not have the same distribution: the former has independent elements, the latter does not.

An interesting question if you want to explore this topic further is to consider the asymptotics of the distribution of the running maxima for the sequences $M_n=\max_{1\leq k\leq n}|X_n|$ and $\overline{M}_n=\max_{1\leq k\leq n}|\overline{X}_n|$.

Answering your weaker question, the sample variances do not have the same distribution. For example, when $n=1$ it is zero and for $n=2$ we have a quantity related to the difference of two iid Cauchy random variables. For larger $n$ we need to do more work to show that the distributions change, one way is to compute sufficiently detailed asymptotics for the characteristic functions.

Your last paragraph could go in the right direction and already partially answers the weaker question :) The rest of the answer is not really connected to the sample variance, but I agree with you that Wikipedia's statement about the sample mean being an "increasingly variable" is neither correct nor a contradiction, because it is mathematically not well defined as the phrase "increasingly variable" is not mathematically defined. — Jakob, Commented Jul 14, 2019 at 1:03

lupus · Accepted Answer · 2024-04-17 10:15:01Z

FWIW I can offer a straightforward explicit result for the case of $n=2$. Call the sample variance $S$, and let the $x_i$ be standard Cauchy random variables. We have the purely algebraical elementary identity \begin{eqnarray*} S & = & \frac{1}{2} \cdot \sum_{i=1}^2 \, (x_i - \overline{x})^2 \; = \; \left( \frac{x_1-x_2}{2} \right)^2 \end{eqnarray*} By the symmetry of the Cauchy distribution $x_1-x_2$ is distributed as is $x_1+x_2$, thus $S$ is distributed as is $\left( \frac{x_1+x_2}{2} \right)^2$, the squared sample mean. As Jakob recalls, for the Cauchy the sample mean $\overline{x}=\frac{x_1+x_2}{2}$ is distributed as is a standard Cauchy variable, and so $S$ is distributed as is a squared standard Cauchy variable.

Let $F, f$ be the distribution and density function of $S$, respectively. We then have for $s>0$ \begin{eqnarray*} F(s) & = & {\bf P}(S \leq s) \; = \; \frac{2}{\pi} \, \arctan(\sqrt{s}) \end{eqnarray*} and correspondingly the density is \begin{eqnarray*} f(s) & = & \frac{1}{\pi \sqrt{s}} \, \frac{1}{1+s} \end{eqnarray*} For example, the median of $S$ equals $1$.

In the case of the normal distribution one generalizes this result to $n \geq 3$ by using orthogonal linear transformations of the basic random variables $x_i$. The crucial point involving this approach is that for the normal distribution zero covariance (i.e., orthogonal linear transformations) implies stochastic independence but that critical feature does not carry over to the Cauchy case.

Of course, it is easy to simulate the sample variance in the Cauchy case which shows that the answer to Jakob's ''weaker question'' is ''no'' -- that is, the distribution of the sample variance does vary with $n$, specifically, its spread increases with $n$.

Stack Exchange Network

Distribution of sample variance of Cauchy distributed variables

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
probability-distributions
.

Hot Network Questions

Distribution of sample variance of Cauchy distributed variables

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statisticsprobability-distributions.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
probability-distributions
.