2
$\begingroup$

Let $X_i$ be a sequence of random variables valued in the finite set $K$. I will let $X^n = (X_1, \ldots, X_n)$.

Suppose that $X_i$ are i.i.d., for clarity. Then for any $\epsilon, \beta$, there is a subset $T= T^{(n)}_{\epsilon, \beta} \subset K^n$ so that $1 - \epsilon < P(T)$, and $x \in T$ has $| (1/n)[-\log p(x)] - H| < \beta$ where $H$ is the Shannon entropy of $X_i$, and $p$ is the joint distribution. $T$ is called a typical set.

This has the consequence that $p$ is roughly uniformly distributed on $T$, up to an error of $2^{\beta}$. I want to know if it is truly uniform in the limit.

I take $X^{\infty}$ to be the kolmogorov limit (sequence space), and let $T_{\beta}$ be the set of sequences that have $|(1/n) [-\log P_X(x_1 \ldots x_n)] - H| < \beta$ eventually (for all large enough n). Let $T = \bigcap_{n \geq 0} T_{1/n}$. The law of large numbers guarantees that T is measure 1, and it is the set of typical sequences.

Is there a reasonable sense in which the induced measure on $T$ is "uniform"?

$X^{\infty}$ naturally has a profinite topology, which $T$ inherits. Perhaps one can use this?

$\endgroup$

1 Answer 1

1
$\begingroup$

Yes, more or less.

Given a stationary sequence $Y=(Y_1,Y_2,\ldots)$ of $K$-valued random variables, let $h(Y)$ denote the entropy rate of $Y$, that is, \begin{align*} h(Y) &:=\lim_{n\to\infty} \frac{H(Y_1,Y_2,\ldots,Y_n)}{n} = H(Y_1|Y_2,Y_3,\ldots) \end{align*} where $H(Y_1,Y_2,\ldots,Y_n)$ is the joint Shannon entropy of $(Y_1,Y_2,\ldots,Y_n)$ and $H(Y_1|Y_2,Y_3,\ldots)$ is the conditional entropy of $Y_1$ given $(Y_2,Y_3,\ldots)$. The existence of the limit and the second equality are standard.

Since the sequence $X=X^\infty$ is i.i.d., we have \begin{align*} -\log P_X(x_1,x_2,\ldots,x_n) &= -\sum_{k=1}^n\log\mathbb{P}(X_k=x_k) = \sum_{k=1}^n g(x_k) \end{align*} where $g(a):=-\log\mathbb{P}(X_1=a)$ for each $a\in K$. Your law of large numbers argument says that almost surely, \begin{align*} \frac{\sum_{k=1}^n g(X_k)}{n} &\to \mathbb{E}[g(X_1)]=H(X_1)=H \end{align*} as $n\to\infty$. In particular, if $T$ denotes the set of all sequences $x\in K^\infty$ such that $(1/n)\sum_{k=1}^n g(x_k)\to H(X_1)$ (as you defined), then $\mathbb{P}(X\in T)=1$.

Now, the intuitive statement that $X$ is "uniform" on $T$ can be expressed as follows:

Claim. Among all stationary sequences $Y$ such that $\mathbb{P}(Y\in T)=1$, $X$ has the largest entropy rate.

This is a very special case of the Dobrushin-Lanford-Ruelle theorem, and follows from the following variational principle:

Lemma (Gibbs' inequality). Every $K$-valued random variable $U$ satisfies $H(U)\leq\mathbb{E}[g(U)]$ with equality if and only if $U\sim X_1$.

[Added note: More explicitly, \begin{align*} -\sum_{a\in K}\mathbb{P}(U=a)\log\mathbb{P}(U=a) &\leq \sum_{a\in K}\mathbb{P}(U=a) g(a) \end{align*} with equality if and only if $\mathbb{P}(U=a)=\mathrm{e}^{-g(a)}=\mathbb{P}(X_1=a)$ for each $a\in K$.]

To prove the claim, we first observe that, according to the ergodic theorem, if $Y$ is a stationary sequence satisfying $\mathbb{P}(Y\in T)=1$, then $\mathbb{E}[g(Y_1)]=H(X_1)$.

[Added note: According to Birkhoff's ergodic theorem, if $\overline{g}(x):=\lim_{n\to\infty}(1/n)\sum_{k=1}^n g(x_k)$, then for every stationary sequence $Y$, the value $\overline{g}(Y)$ is almost surely well defined (i.e., the limit exists) and $\mathbb{E}[\overline{g}(Y)]=\mathbb{E}[g(Y_1)]$. Note that $T=\{x: \overline{g}(x)=H(X_1)\}$. Therefore, if $Y$ is such that $\mathbb{P}(Y\in T)=1$, then $\mathbb{E}[g(Y_1)]=\mathbb{E}[\overline{g}(Y)]=H(X_1)$.]

We now show that for every stationary $Y$, $h(Y)\leq\mathbb{E}[g(Y_1)]$ with equality if and only if $Y$ has the same distribution as $X$. Note that this indeed proves the claim.

Writing $\hat{H}(Y_1|Y_2,Y_3,\ldots)$ for the random variable whose value is the entropy of the conditional distribution of $Y_1$ given $(Y_2,Y_3,\ldots)$, we have \begin{align*} h(Y) - \mathbb{E}[g(Y_1)] &= H(Y_1|Y_2,Y_3,\ldots) - \mathbb{E}\big[\mathbb{E}[g(Y_1)|Y_2,Y_3,\ldots]\big] \\ &= \mathbb{E}\Big[ \underbrace{\hat{H}(Y_1|Y_2,Y_3,\ldots) - \mathbb{E}[g(Y_1)|Y_2,Y_3,\ldots]}_{ \text{$\leq 0$ by Gibbs' inequality} } \Big] \\ &\leq 0 \end{align*} with equality if and only if conditioned on $Y_2,Y_3,\ldots$, the random variable $Y_1$ has the same distribution as $X_1$ (which is independent of $Y_2,Y_3,\ldots$). In particular, the stationarity of $Y$ implies that the equality happens if and only if $Y_i$ are i.i.d. with $Y_1\sim X_1$.

$\endgroup$
4
  • $\begingroup$ (Thanks for the detailed response. Will work through it tomorrow.) $\endgroup$
    – Elle Najt
    Commented May 12, 2017 at 9:15
  • $\begingroup$ Some questions: Unless I misunderstand your notation, I think Gibb's inequality should be $H(X) \leq E [ g(U) ]$, where the expectation is taken with respect to the distribution from $X$ (?). Also, I don't see how the ergodic theorem proves the next claim - can you be more explicit there? $\endgroup$
    – Elle Najt
    Commented May 13, 2017 at 18:49
  • $\begingroup$ In my notation, $X=(X_1,X_2,\ldots)$ and $Y=(Y_1,Y_2,\ldots)$ are entire sequences, while $U$ is a single $K$-valued random variable. The statement of Gibbs' inequality is correct. In comparison with the Wikipedia page, $p$ would be the distribution of $U$ and $q$ the distribution of $X_1$. $\endgroup$
    – Blackbird
    Commented May 14, 2017 at 4:07
  • $\begingroup$ Added some explanation in the answer. $\endgroup$
    – Blackbird
    Commented May 14, 2017 at 4:43

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .