1
$\begingroup$

Shannon's "self-information" of the specific outcome "A" is given as: -log(Pr(A)), and the entropy is the expectation of the "self-information" of all the outcomes of the random variable.

When the base of the log is 2, the units of information/entropy are called "bits".

What is the best explanation the following simple question:

Why do these information units are called "bits"?

$\endgroup$

2 Answers 2

3
$\begingroup$

One good reason to call them bits is that this is the number of bits that you need on average to encode an outcome. Some Wikipedia articles you might want to take a look at are Huffman coding, arithmetic coding, entropy encoding and Shannon's source coding theorem.

To give a simple example, say outcome A has probability $1/2$ and outcomes B and C have probabilities $1/4$ each. Then you can encode A by $0$, B by $10$ and C by $11$. This is an optimal prefix-free code; the expected number of bits required to encode an outcome is $\frac12\cdot1+\frac14\cdot2+\frac14\cdot2=\frac32$, and since the number of bits in each code is the self-information (to base $2$) of the outcome it encodes, this expected number of bits is the entropy of the distribution.

$\endgroup$
4
  • $\begingroup$ @Michael: You're welcome. $\endgroup$
    – joriki
    Commented Mar 20, 2012 at 19:58
  • $\begingroup$ I'm familiar with information theory and know the source and channel coding approaches. From this point of view I completely understand the usage of a term "bit". But what is the meaning, for example, of the "self-information" of 10 bits of some event? $\endgroup$
    – Michael
    Commented Mar 20, 2012 at 20:05
  • $\begingroup$ @Michael: I'm not sure I understand the question. As my example illustrates, the self-information of an event is the number of bits in the code used to encode it in an optimal prefix-free code. What other meaning are you looking for? (By the way, it would have been a good idea to mention in the question that you're familiar with information theory.) $\endgroup$
    – joriki
    Commented Mar 20, 2012 at 20:10
  • $\begingroup$ Maybe I'm digging to much in it... But it is not always possible to achieve exactly the "self-information" with a symbol code (only in the case when the probabilities are the powers of 2). It is a nice explanation with symbol codes but how you explain students that the units are "bits" on the first lesson? $\endgroup$
    – Michael
    Commented Mar 20, 2012 at 20:19
0
$\begingroup$

Claude Shannon coined the term "bits" as a contraction of "binary digits". If you have a source of random binary digits in which there is no bias or pattern of any sort, then the probability that the next binary digit to come out will be specifically a $1$ (or specifically a $0$, for that matter) is $1/2$, regardless of what any of the preceding digits show. Because the digits are independent, the probability that the next two digits will be something specific is $1/4$, and the probability that the next $n$ digits will be a specific seqence is $2^{-n}$.

Therefore, a sequence of random binary digits of arbitrary length is a good analogy for the surprise you find in anything else.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .