Shannon's "self-information" of the specific outcome "A" is given as: -log(Pr(A)), and the entropy is the expectation of the "self-information" of all the outcomes of the random variable.
When the base of the log is 2, the units of information/entropy are called "bits".
What is the best explanation the following simple question:
Why do these information units are called "bits"?