1
$\begingroup$

In this wikipedia article, there is a proof given for one of the directions of the Shannon's source coding theorem using the asymptotic equipartition property (AEP). I am unable to follow the proof. Here are the relevant definitions. To get the idea of the proof I wish to see the implication only for a special case (of the direction of interest) of the source coding theorem, which is what I state below.

For a finite set $\Sigma$, we write $\Sigma^*$ to mean the set of all the finite strings than can be formed using the elements of $\Sigma$.

A binary code is a function $c:S\to \{0, 1\}^*$. The extension of $c$ is the function $c^*:S^*\to \{0, 1\}^*$ which is defined as $c(s_1\cdots s_n)=c(s_1)\cdots c(s_n)$, the concatenation of the strings $c(s_i)$'s. A binary code is said to be uniquely decodable if its extension is injective. See this for more context.

Define the length map $\ell:\{0, 1\}^*\to \mathbb N$ which maps a binary string to its length.

Source Coding Theorem. Let $S$ be a finite set and $\mu$ be a probability measure on $S$ and $c:S\to \{0, 1\}^*$ be a uniquely decodable code. Then $\mathbb E[\ell(c)] = \int_S \ell\circ c\ d\mu\geq H_\mu = -\sum_{s\in S}\mu(s)\log_2\mu(s)$.

For each $n\geq 1$ and $\epsilon>0$, equip $S^n$ with the product measure $\mu^n$ and define the set $$A_n^{\epsilon} = \{(s_1, \ldots, s_n)\in S^n:\ \left|-\frac{1}{n}\mu^n(s_1, \ldots, s_n) - H_\mu\right|< \epsilon\}$$

Assume that the following is true (a consequence of AEP)

Consequence of AEP. For each $\epsilon>0$, there is $n$ large enough so that $\mu^n(A_n^\epsilon)>1-\epsilon$.

Now the wikipedia article hints (last line of the section Proof:Source Coding Theorem) that the source coding theorem follows from the fact that "any set of size smaller than $A_n^\epsilon$ (in the sense of exponent) would cover a set of probability bounded away from 1."

I do not even follow what this means and am unable to see how the coding enters the picture here. Can somebody please elaborate on the hint or provide a reference where more details can be found?

$\endgroup$
6
  • $\begingroup$ that's not the source coding theorem Wikipedia is talking about $\endgroup$ Commented Apr 25, 2020 at 15:15
  • $\begingroup$ @mathworker21 Wikipedia has two headings: Source coding theorem and Source coding theorem for symbol Codes. The statement for the former is made informally. I was assuming that the latter (what I am really after) is a special case of the former. Is that not right? $\endgroup$ Commented Apr 25, 2020 at 15:25
  • $\begingroup$ it may be, but my point was you didn't mention the converse in your question, but you're asking about the proof of the converse $\endgroup$ Commented Apr 25, 2020 at 15:25
  • $\begingroup$ @mathworker21 I had mentioned that I was interested in 'one of the directions' and that is the direction I have stated. $\endgroup$ Commented Apr 25, 2020 at 15:27
  • $\begingroup$ yes, sorry, I just had a brain fart. I knew you had it right when writing my answer $\endgroup$ Commented Apr 25, 2020 at 15:27

1 Answer 1

1
$\begingroup$

Here's what's going on in the proof Wikipedia gave of "achievability". A very small set contains nearly all of the strings that will be generated, probability-wise. I.e., the typical set has an exponentially small proportion of tuples $(x_1,\dots,x_n)$ but has $1-o(1)$ of the probability. Since the typical set is so small, you don't need too many bits to describe the typical set; if a set has $k$ elements, you only need $\log_2(k)$ bits.

And the converse is true. If you want to describe a set of $k$ tuples, you need at least $\log_2(k)$ bits. So if you can compress into fewer than $H_\mu$ bits with probability close to $1$, there is a set of tuples of size less than $2^{H_\mu}$ with probability close to $1$, but there just isn't.

$\endgroup$
6
  • $\begingroup$ For the converse, can you say where have you used the unique decodability? (Without unique decodability the statement is false since if $S=\{a, b, c, d\}$, $\mu$ is uniform, and $c:S\to \{0, 1\}^*$ is defined as $c(a)=0, c(b)=1, c(c)=00, c(d)=01$, then $\mathbb E[\ell\circ c]=3/2$ while $H_\mu=2$.) $\endgroup$ Commented Apr 25, 2020 at 15:58
  • 1
    $\begingroup$ @caffeinemachine saying if a set has size less than $2^{H_\mu}$, it has probability bounded away from $1$. if there's not unique decodability, you can fit much more stuff into a small set. unique decodability gives injectivity, so sizes are preserved $\endgroup$ Commented Apr 25, 2020 at 20:38
  • $\begingroup$ So I guess your argument is that, writing $L=\ell\circ c$, suppose $\mathbb E[L]=H_\mu-\epsilon$. Then taking $n$ independent samples $X_1, \ldots, X_n$ from $S$ we see that $L(X_1)+\cdots L(X_n)\leq n((H_\mu)-\epsilon)$ happens with high probability. So the set of points $B_n=\{x\in S^n:\ L(x)\leq n(H_\mu-\epsilon)\}$ has size exponentially small compared to $|S^n|$ but it's probability approaches $1$ as $n\to \infty$, And this cannot happen. Am I right? $\endgroup$ Commented Apr 25, 2020 at 23:20
  • $\begingroup$ @caffeinemachine yup! ${}{}{}{}$ $\endgroup$ Commented Apr 25, 2020 at 23:28
  • $\begingroup$ I am struggling with the 'and this cannot happen' part in my last comment. I have articulated that part as a separate thread here: math.stackexchange.com/questions/3645195 $\endgroup$ Commented Apr 26, 2020 at 20:04

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .