Proving Asymptotic Equipartition Property for Gaussian r.v.'s using the Chernoff Bound

Question

I just learned about Chernoff Bounds and am wondering if one can prove the Asymptotic Equipartition Property using them instead of the Weak Law of Large Numbers (which is the consequence of the Chebyshev Inequality). I started with a what I thought was a simple case: sequence of i.i.d. standard Gaussian random variables $\{X_i\}$ where $X_i\sim\mathcal{N}(0,\sigma^2)$. We are interested in proving:

$$P\left(\left|-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)-H(X)\right|\geq\epsilon\right)\rightarrow 0$$

First step in applying Chernoff Bound is finding the Moment Generating Function. However, I ran into trouble immediately:

$$\begin{array}{rcl}E\left[\exp(-t\log(\phi(x))\right]&=&\int e^{-t\log(\phi(x))}\phi(x)dx\\ &=&\int\exp\left(-t\log\left(\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/\sigma^2}\right)\right)\phi(x)dx\\ &=&\int\exp\left(t\left(\log\left(\sqrt{2\pi}\sigma\right)+x^2/\sigma^2\right)\right)\phi(x)dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}\exp\left(tx^2/\sigma^2\right)}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}}{\sqrt{2\pi}\sigma}\exp\left(\frac{-x^2(1-t)}{2\sigma^2}\right)dx\\ &=&? \end{array}$$ where $\phi(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}$. I am stuck because $t\in\mathbb{R}$ and thus can be greater than one...

What did I do wrong? Is it even possible to prove AEP using Chernoff Bounds?

What happened to this question? Are you not interested anymore? — Did, Commented Jan 19, 2012 at 13:15
Sorry, I've been really busy. Still interested, but haven't had the time to properly digest your answer. — M.B.M., Commented Jan 19, 2012 at 17:12

Did · Accepted Answer · 2012-02-07 07:42:41Z

Let me abstract a little bit the situation and show that the inequality you have in mind stems from a rather general result (as mentioned in the comments, the following should be explained in every introductory text on large deviations principles). Consider some integrable random variables $\xi$ and $(\xi_n)_{n\geqslant1}$ and, for some $n\geqslant1$ and some positive $x$, the event

$$ A_n=[\xi_1+\cdots+\xi_n\geqslant n\mathrm E(\xi)+nx]. $$

Using first $\xi_i=\log\phi(X_i)$ and then $\xi_i=-\log\phi(X_i)$, the event you consider is the union of two events $A_n$, hence our task is to bound $\mathrm P(A_n)$.

Now comes an additional, non trivial, hypothesis: assume that $\xi$ is exponentially integrable, that is:

$\qquad\qquad$ There exists $t_0\gt0$ such that $\mathrm E(\mathrm e^{t\xi})$ is finite for every $|t|\lt t_0$. $\qquad(\ast)$

Then the proof is simple. Start from the almost sure inequality, valid for every nonnegative $t$: $$ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathbf 1_{A_n}\leqslant\mathrm e^{t(\xi_1+\cdots+\xi_n)}=\mathrm e^{t\xi_1}\cdots\mathrm e^{t\xi_n}. $$ Integrate both sides and use the independence assumption. This yields $$ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathrm P(A_n)\leqslant\mathrm E(\mathrm e^{t\xi})^n. $$ We proved that, for every nonnegative $t$, $$ \mathrm P(A_n)\leqslant u(t)^n\qquad\text{where}\qquad u(t)=\mathrm E(\mathrm e^{t\xi})\mathrm e^{-t(\mathrm E(\xi)+x)}. $$ Note that a lot of these inequalities are just useless. For example, $u(0)=1$ hence the $t=0$ version is $\mathrm P(A_n)\leqslant 1$... Big deal! Even worse: if $t\gt0$ is such that $u(t)\gt1$, or such that $\mathrm E(\mathrm e^{t\xi})$ is infinite, we painfully proved that $\mathrm P(A_n)$ is bounded by something greater than $1$, or infinite. :-)

On the other hand, if there exists at least some nonnegative $t$ such that $u(t)\lt1$, the inequality becomes a strong control on $\mathrm P(A_n)$, namely, that $\mathrm P(A_n)\to0$ at least as fast as a geometric sequence. Hence the key question becomes:

$\qquad\qquad\qquad$ Does there exist some $t\gt0$ such that $u(t)\lt1$?

To answer this, let us consider the limit $t\to0$ with $t\gt0$. Then, $$ \mathrm E(\mathrm e^{t\xi})=1+t\mathrm E(\xi)+o(t),\qquad \mathrm e^{-t(\mathrm E(\xi)+x)}=1-t(\mathrm E(\xi)+x)+o(t), $$ hence $u(t)=1-tx+o(t)$. Since $x\gt0$, this limited expansion implies that $u(t)\lt1$ for (at least) some (small) $t\gt0$. The proof is over.

To sum up, the result stems from a Markov-type inequality plus a limited expansion of the Laplace transform $t\mapsto\mathrm E(\mathrm e^{t\xi})$ at $t=0$. The Markov-type inequality is powerful because the $\xi_n$ are independent. The limited expansion is valid because we assumed $(\ast)$.

Note that the proof above uses the finiteness of $\mathrm E(\mathrm e^{t\xi})$ for $0\leqslant t\leqslant t_0$ only. The finiteness for $-t_0\leqslant t\leqslant 0$ is needed when dealing with the event $[\xi_1+\cdots+\xi_n\leqslant n\mathrm E(\xi)-nx]$, hence the control $(\ast)$ must indeed be two-sided.

Coming back at last to the question you asked, if $\xi=\log\phi(X)$ with $X$ centered normal of variance $\sigma^2$ and $\phi$ the density of the distribution of $X$, then $\mathrm E(\mathrm e^{t\xi})$ is finite if and only if $\mathrm E(\phi(X)^{t})$ is finite if and only if $\mathrm E(\mathrm e^{-tX^2/(2\sigma^2)})$ is finite if and only if $t+1\gt0$, hence $(\ast)$ holds with $t_0=1$.

I finally was able to properly sit down and read your answer. Thank you, I think it's clear to me now. Basically, one can use Chernoff Bounds to prove AEP for a limited selection of distributions of the random variables (i.e. when (*) holds). — M.B.M., Commented Feb 7, 2012 at 6:40

Ashok · Accepted Answer · 2012-01-05 07:05:06Z

I don't think we can use Chernoff bound to prove AEP. We should actually prove $$P\left(\left|-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)-H(X)\right|>\epsilon\right)\rightarrow 0$$ So we should prove both $$P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\rightarrow 0 \text{ and }P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)<H(X)-\epsilon\right)\rightarrow 0 $$

Chernoff bound reads, for any $t>0$, $P(X>x)\le e^{-t x} E(e^{t X})$. Your calculation of $E(e^{t \log \phi(X)})$ is correct. This expectation exists for $t<1$.

Now $$P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\le e^{-nt(H(X)+\epsilon)} \cdot E\left[e^{-t \sum_{i=1}^n\log \phi(X_i)}\right]$$ where $$E\left[e^{-t\sum_{i=1}^n\log \phi(X_i)}\right]=n\cdot E(e^{-t \log \phi(X)})=\frac{n}{(\sqrt{2\pi}\sigma)^{1-t}}\int_{-\infty}^{\infty}\exp\left(\frac{-x^2(1-t)}{2\sigma^2}\right)dx$$ which is positive for $t<1$. So $$P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\rightarrow 0$$ But I don't think we can use Chernoff bound to show the other probability also goes to $0$.

Edit: As Bullmoose requested I will post the reason why Chernoff bound doesn't work to the show the other probability goes to zero.

Chernoff bound in the other case is $P(X<x)\le e^{t x} E(e^{-t X})$. So what I think we would get in the other case is that $$\begin{eqnarray*}P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)<H(X)-\epsilon\right)&\le& e^{nt(H(X)-\epsilon)} \cdot E\left[e^{t \sum_{i=1}^n\log \phi(X_i)}\right]\\ &=& e^{nt(H(X)-\epsilon)} \cdot\frac{n}{(\sqrt{2\pi}\sigma)^{1+t}}\int_{-\infty}^{\infty}\exp\left(\frac{-x^2(1+t)}{2\sigma^2}\right)dx\end{eqnarray*} $$ and the right hand side goes to $\infty$ as $n\to \infty$ for $\epsilon<H(X)$.

I think we can use it, the trick is keeping $H(X)$ on the LHS in the probability statement. Thank you for inspiring me, I've posted my own answer which I probably wouldn't have figured out without your attempt. — M.B.M., Commented Jan 4, 2012 at 20:48
My comment on Bullmoose's answer applies to this one as well. — Did, Commented Jan 5, 2012 at 7:29

Stack Exchange Network

Proving Asymptotic Equipartition Property for Gaussian r.v.'s using the Chernoff Bound

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
inequality
information-theory
.

Hot Network Questions

Proving Asymptotic Equipartition Property for Gaussian r.v.'s using the Chernoff Bound

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged inequalityinformation-theory.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
inequality
information-theory
.