13
$\begingroup$

I have seen several "maximum entropy distributions" used in the mathematical and statistical literature, often with the justification that they are "minimally informed" beyond the assumptions and data used to construct them.

However, it seems that the appeal to information content of a signal or distribution is via an appeal to entropy, hence to say that increasing entropy decreases informativeness seems circular, lacking any external foundation that would compel us to rationally equate Shannon entropy with information content (or, more precisely, the lack thereof).

What is the foundational science/concept that warrants Entropy as a measure of information content?


NOTE: I have read Shannon's original paper, where he discusses an axiomatic derivation of his entropy function -- but...he also takes pains to point out that this is not his primary justification (pp. 10-11 and App. 2, Shannon 1948). Instead, it was its empirical success in communications engineering that he felt warranted it's use. Also, note that the work of Renyi and Uffink have pointed out that Shannon's tenets are not the only way to construct a plausible information measure.

However, nowadays, we are applying his formula to constructs like uncertainty distributions, which have little hope of experimental verification. In addition, it is not clear that using a max-entropy distribution has any advantage over any other distribution when there is a high level of uncertainty.

$\endgroup$
5
  • 2
    $\begingroup$ Anybody out there who knows what entropy is? $\endgroup$ Commented Dec 30, 2014 at 14:57
  • 1
    $\begingroup$ @kjetilbhalvorsen well, in physics its related to the number of microstates that achieve a particular thermodynamic macrostate and is related to the long-run distribution of a system's state vector (think Markov steady state distribution). Not sure its relevance in mathematics or induction, per se - hence my question $\endgroup$
    – user76844
    Commented Dec 30, 2014 at 15:40
  • $\begingroup$ Well, but your question was not about physics, so my question stands: what do really entropy mean in information (or probability) theory? One book I read said that while probability measures uncertainty in an event only, entropy measures uncertanty in a distribution. But it didnt try to explain what that sentence means. I have no clue! $\endgroup$ Commented Dec 30, 2014 at 17:07
  • $\begingroup$ @kjetilbhalvorsen I see, I misunderstood your comment. I agree that your book's lack of justification for Entropy = Distributional Uncertainty is an example of the "hand wavy" use of entropy outside of physics. Also note that probability of an event can actually be calibrated using data to test it (even with Bayesian probability...presumably, we expect to be right 90% of the time when among events that we subjectively assign 90% probability). $\endgroup$
    – user76844
    Commented Dec 30, 2014 at 20:22
  • $\begingroup$ See my answer here: stats.stackexchange.com/questions/66186/… for a statistical viewpoint. $\endgroup$ Commented Nov 10, 2016 at 16:29

1 Answer 1

5
$\begingroup$

Have you read about Maxwell's demon?

Maxwell realized that if it was possible to track the trajectories of moving particles, and open and close a shutter and just the right times, it would be possible to locally increase a system's temperature without doing any work on the system. The idea would be to selectively allow the hottest particles to move into a chamber, and leave the cooler ones behind.

This obviously breaks the second law of thermodynamics.

Maxwell reasoned that making the measurement takes work. Even though the demon doesn't have to touch any of the particles, the measurements are imparting energy into the system, so that information is effectively another form of energy.

This has been experimentally verified. See http://en.wikipedia.org/wiki/Landauer%27s_principle, for example. Information is a conserved quantity. There is nothing hand wavy about this. Indeed, it is of foundational physical importance.

Now, having at least informally "established" that information is a physical quantity, consider that every (computable)[1] probability distribution can be encoded as a sequence of 0's and 1's, in a minimal sense (with respect to the Kolmogorov complexity), and that this sequence can be transmitted. At this point, the abstract, non-physical probability distribution has been transformed into a physical entity (a message) with bona fide physical information content. Since this information content is a conserved quantity, it seems fair to say that it is a property of the abstract, non-physical probability distribution. After all, up to scalar constant, the entropy is a property of the message, not the encoding.

'Entropy' is overloaded in a way that makes it confusing, especially with respect to physical versus information entropy. So let's take a step back and note that "information entropy" is the average information content per sample, where a sample is a message, a random variable, or the result an experiment. In each of these, there is an "unknown" quantity, and an observation that reveals aspects of it. The higher the unknown quantity's entropy, the more informative an observation is, insofar as the sample is 'representative' of the whole. We can call this the outside the box view. We are trying to estimate what the inside of the box is like by taking a limited number of samples and analyzing them under "steady state" assumptions.

And this is partly why information entropy is a decent enough measure of aggregate information. For the channel, what we really want to know is how many messages can we send in parallel, and to figure that out we need to know how much information each message contains, on average. Throw in some operations research, and boom, you have the tools to manage a modern telecommunications infrastructure.

On the other hand, we have the in the box view -- the claim that entropy is a property of a system. I suggest that we broaden this, in analogy with the outside of the box view. In particular, to probability distributions, and ... well, I'm not quite sure how to phrase this ... experimental subjects. In any case, what I really mean is the thing that yields data to be observed by scientists, engineers, people on the other side of the internet, etc. These "subjects" are the unknown quantities people are trying to estimate.

These systems have an internal structure, and part of that structure substantively relates to the data they emit. The physical law inside the box is that entropy increases, energy and information are "lost" as heat and signal noise. Energy and information are still conserved quantities: this particular microstate could only have occurred because of the initial microstate conditions, so, in principle (and ignoring quantum mechanics), if we observe the microstate in sufficient detail, we can work backwards to recover the initial condition. But if we want to estimate the initial microstate from outside the box, we are very much stuck.

Probability distributions also fall under this rubric. I am specifically ignoring stochastic processes insofar as they are "time-dependent" and considering them as time-less (i.e., eternal) formal objects. The structure does not change over time, and their entropy is constant (since the structure is fixed).

At a high level, the only difference between the physical and formal structures is time. In both cases, the structure is encoded by information, and that structure is revealed by sampling. The physical system's structure decays, and the more that it decays, the more representative of the whole any given sample will be.

Now, one might notice that as entropy increases in the box, information is effectively lost inside the box, but at the same time, each sample effectively becomes more informative. The facts are clearly related, but I'm not in a position to estimate bounds or make a strong claim other than "look, that's neat". Jensen's inequality comes to mind.

[1]: Every probability distribution can be encoded as a tree of 0's and 1's, even if it the distribution isn't computable.

$\endgroup$
9
  • $\begingroup$ +1 Thanks for the post. I am aware of Maxwell's Demon and fully appreciate the relevance and reality of information as a physical quantity (David Deutch is even creating a whole "fundamental algebra" around information). However, my question acknowledges this...it is not aimed at critiquing physicists' conception of information as bits nor at information theory, per se (and attendant communications theory) as these are grounded in physics. I am skeptical about extending Shannon Entropy to non-physical things like "uncertainty" or probability. $\endgroup$
    – user76844
    Commented Jan 1, 2015 at 3:49
  • $\begingroup$ Also, Landauer's principle and other information-thermodynamic theories are primarily about information processing, not about information itself. All these prove is that we cannot manipulate physical systems for "free" to preserve certain states. Thus, creating/destroying information requires energy and changes the entropy, but its not clear that entropy measures information content as much as irreversible heat exchanges. I don't want to come off as too critical of this though, I like your answer and its rather compelling as an informal rationale...ill accept it if nothing else comes by. $\endgroup$
    – user76844
    Commented Jan 1, 2015 at 3:59
  • $\begingroup$ You may find this interesting reading: plato.stanford.edu/entries/information-entropy/#LanPri $\endgroup$
    – user76844
    Commented Jan 1, 2015 at 4:00
  • 1
    $\begingroup$ mdpi.com/1099-4300/13/3/595/pdf $\endgroup$
    – nomen
    Commented Jan 1, 2015 at 4:43
  • 1
    $\begingroup$ @Eupraxis: check out the answer. $\endgroup$
    – nomen
    Commented Jan 1, 2015 at 9:31

You must log in to answer this question.