In the following lecture notes chapter 3, page 12-13, they state the following
We begin by introducting some important notation:
- For a set $\mathcal{S},|\mathcal{S}|$ denotes its cardinality (number of elements contained on the set). For example, let $\mathcal{U}=\{1,2, \ldots, M\},$ then $|\mathcal{U}|=M$
- $ u^{n}=\left(u_{1}, \ldots, u_{n}\right)$ is an $n$ -tuple of $u$
- $ \mathcal{U}^{n}=\left\{u^{n} | u_{i} \in \mathcal{U} ; i=1, \ldots, n\right\} .$ It is easy to see that $\left|\mathcal{U}^{n}\right|=|\mathcal{U}|^{n}$
- $ U_{i}$ generated by a memoryless source $U^{'}$ implies $U_{1}, U_{2}, \ldots$ i.i.d. according to $U$ (or $P_{U}$ ). That is.$$p\left(u^{n}\right)=\prod_{i=1}^{n} p\left(u_{i}\right)$$ Definition 12. The sequence $u^{n}$ is $\epsilon$ -typical for a memoryless source U for $\epsilon>0,$ if $$ \left|-\frac{1}{n} \log p\left(u^{n}\right)-H(U)\right| \leq \epsilon $$ or equivalently, $$ 2^{-n(H(U)+\epsilon)} \leq p\left(u^{n}\right) \leq 2^{-n(H(U)-\epsilon)} $$ Let $A_{\epsilon}^{(n)}$ denote the set of all $\epsilon$ -typical sequences, called the typical set. So a length- $n$ typical sequence would assume a probability approximately equal to $2^{-n H(U)}$. Note that this applies to memoryless sources, which will be the focus on this course $^{1}$.
Theorem 13 (AEP). $\forall \epsilon>0, P\left(U^{n} \in A_{\epsilon}^{(n)}\right) \rightarrow 1$ as $n \rightarrow \infty$Proof This is a direct application of the Law of Large Numbers (LLN). $$ \begin{aligned} P\left(U^{n} \in A_{\epsilon}^{(n)}\right) &=P\left(\left|-\frac{1}{n} \log p\left(U^{n}\right)-H(U)\right| \leq \epsilon\right) \\ &=P\left(\left|-\frac{1}{n} \log \prod_{i=1}^{n} p\left(U_{i}\right)-H(U)\right| \leq \epsilon\right) \\ &=P\left(\left|\frac{1}{n}\left[\sum_{i=1}^{n}-\log p\left(U_{i}\right)\right]-H(U)\right| \leq \epsilon\right) \\ & \rightarrow 1 \text { as } n \rightarrow \infty \end{aligned} $$ where the last step is due to the Law of Large Numbers (LLN), in which $-\log p\left(U_{i}\right)$ 's are i.i.d. and hence their arithmetic average converges to their expectation $H(U)$
My question is related to the proof. My understanding is the following; $U^n$ is a sequence of random variables $U^n = (U_1, U_2, \ldots,U_n)$ drawn i.i.d from some distribution $p_U(u) = p(u)$ and $u^n$ is a realization of the sequence.
However in the proof they switch from using $p(u^n)$ to $p(U^n)$ and I am not sure what $p(U^n)$ represents? What would be wrong by doing it as follows?
$$ \begin{aligned} P\left(u^{n} \in A_{\epsilon}^{(n)}\right) &=P\left(\left|-\frac{1}{n} \log p\left(u^{n}\right)-H(U)\right| \leq \epsilon\right) \\ &=P\left(\left|-\frac{1}{n} \log \prod_{i=1}^{n} p\left(u_{i}\right)-H(U)\right| \leq \epsilon\right) \\ &=P\left(\left|\frac{1}{n}\left[\sum_{i=1}^{n}-\log p\left(u_{i}\right)\right]-H(U)\right| \leq \epsilon\right) \\ & \rightarrow 1 \text { as } n \rightarrow \infty \end{aligned} $$