1
$\begingroup$

If one deals with the information content of news, one comes across the so-called entropy again and again on the internet. When this is explored further, it is often referred to as a measure of the information content or uncertainty of a message. I refer here to Shannon's entropy definition, which is summarised as follows according to the german Wikipedia:

Claude Elwood Shannon defined the entropy H of a discrete memoryless source (discrete random variable) $X$ over a finite alphabet consisting of characters $Z=\left\{z_{1}, z_{2}, \ldots, z_{m}\right\}$ as follows: First, one assigns to each probability $p$ of an event its information content $I(z)=-\log _{2} p_{z}$. Then the entropy of a sign is defined as the expected value of the information content $$ \mathrm{H}_{1}=E[I]=\sum_{z \in Z} p_{z} I(z)=-\sum_{z \in Z} p_{z} \log _{2} p_{z} $$

For me, the notion of uncertainty, which is also mentioned in part in this context, is problematic because I find it hard to imagine that the notion of uncertainty, which usually stems from a lack of information, and entropy, as a measure of information content, go together. How can one imagine this in an intuitive way?

$\endgroup$

1 Answer 1

3
$\begingroup$

We have the following closely related notions:

  • entropy (the information value)
  • probability distribution (which outcomes should we already expect?)
  • uncertainty (are we certain of the outcome, or will we learn something)

Low entropy

When we get a highly expected piece of information, we are already almost certain of the content and gained hardly any information value. Hence high probability, low uncertainty, low entropy.

Similarly, when we do NOT get a very unexpected piece of information, we were almost certain not to get it. NOT getting it contains low information value. Hence low probability, low uncertainty, low entropy.

High entropy

When we get a highly unpredictable uniformly random "flip of a coin"-like piece of information, we did not expect it and were quite uncertain of what it would be. The information value is very high! Hence almost 50/50 probability, high uncertainty, high entropy.

Example

Suppose you were to guess an English word. Then consider the expected value of getting answers to the following questions:

  • Does it contain the letter "E"?
  • Does it contain the letter "Z"?

You should expect the answers to be "MAYBE" and "NO". A randomly chosen English word has around $p\approx1/8=12.5\%$ probability of containing the letter "E" whereas "Z" is quite rare (let us say $p\approx1/64$). If we use those figures, we have:

$$ \begin{align} I[E]&= -1/8\cdot\log_2(1/8)=3/8&&=0.375\\ I[not\ E]&=-7/8\cdot\log_2(7/8)&&\approx0.169\\ H[E\ not\ E]&=I[E]+I[not\ E]&&\approx0.543 \end{align} $$ and $$ \begin{align} I[Z]&= -1/64\cdot\log_2(1/64)=6/64&&\approx0.094\\ I[not\ Z]&=-63/64\cdot\log_2(63/64)&&\approx0.022\\ H[Z\ not\ Z]&=I[Z]+I[not\ Z]&&\approx0.116 \end{align} $$

Hence we will most likely learn something (dividing our candidate words into 1/8 portions) when getting an answer to the first question, whereas we will probably not learn much from confirming that "Z" is not contained in the word, only excluding a very small set of candidates (1 in 64).

Ideal type of question A yes/no or true/false question like this has the potential of bisecting the candidate space into equal parts, so if we could ask the right question and be sure to either include or exclude half of the candidate words we would gain 1 bit of information. The ideal type of question should have a coin flip 50/50 probability.

$\endgroup$
3
  • $\begingroup$ In this linkage, I perceive that a fundamental idea of the connection between probability and uncertainty of an event is recognisable. If I now go into your last point and we have a uniform distribution, then I have already come across the connection that the reciprocal value of the probability for the event is proportional to the uncertainty of the event, so in a certain way it provides the uncertainty with a parameter. Is that somehow intuitively justifiable too? And of course thank you for your answer. $\endgroup$
    – Rico1990
    Commented Nov 2, 2021 at 12:00
  • $\begingroup$ @Rico1990 Sorry, I had to first fix some glaring errors in my example. If we take Shannon entropy to be our chosen measure of uncertainty, then it is not correct to draw that connection, since it is defined as $p\cdot\log_2(1/p)=-p\cdot\log_2(p)$ which has a graph almost like a parabola, not the reciprocal value. Both high and low probabilities result in low uncertainty and low entropy. $\endgroup$
    – String
    Commented Nov 2, 2021 at 12:28
  • $\begingroup$ Thank you in advance for your answer. In case I still have questions, I will contact you. $\endgroup$
    – Rico1990
    Commented Nov 8, 2021 at 16:12

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .