How do you prove $S=-\sum p\ln p$?

Question

How does one prove the formula for entropy $S=-\sum p\ln p$? Obviously systems on the microscopic level are fully determined by the microscopic equations of motion. So if you want to introduce a law on top of that, you have to prove consistency, i.e. entropy cannot be a postulate. I can imagine that it is derived from probability theory for general system. Do you know such a line?

Once you have such a reasoning, what are the assumptions to it? Can these assumptions be invalid for special systems? Would these system not obey thermodynamics, statistical mechanics and not have any sort of temperature no matter how general?

If thermodynamics/stat.mech. are completely general, how would you apply them the system where one point particle orbits another?

You'll probably want to research information theory. This is the Shannon entropy. Interestingly, it's a constant of motion for Hamiltonian systems! You have a very interesting question, yet the answer could fill books. — Kasper, Commented Sep 7, 2011 at 18:44
Shannon entropy exists, but it's still no answer why it is used in physics. Shannon entropy probably has some presuppositions. So why does physics satisfy these presuppositions? — Gere, Commented Sep 7, 2011 at 18:55
I guess the correct outline of an answer would be to start with classical thermodynamics and get to Carnot/reversible heat engine, find entropy as a state function, before delving into stat mech to give it a microscopic interpretation... Seems like a big job... — genneth, Commented Sep 7, 2011 at 23:02
The common Carnot argument wouldn't help. The question how it is supposed to be connected with all microscopic processes in general would still be as open as before. Someone must have tried that before? I know common literature but it's not in there :(; S is whatever people use for showing irreversibility. — Gere, Commented Sep 7, 2011 at 23:36

Ron Maimon · Accepted Answer · 2011-09-08 17:33:06Z

The theorem is called the noiseless coding theorem, and it is often proven in clunky ways in information theory books. The point of the theorem is to calculate the minimum number of bits per variable you need to encode the values of N identical random variables chosen from $1...K$ whose probabilities of having a value $i$ between $1$ and $K$ is $p_i$. The minimum number of bits you need on average per variable in the large N limit is defined to be the information in the random variable. It is the minimum number of bits of information per variable you need to record in a computer so as to remember the values of the N copies with perfect fidelity.

If the variables are uniformly distributed, the answer is obvious: there are $K^N$ possiblities for N throws, and $2^{CN}$ possiblities for $CN$ bits, so $C=\log_2(k)$ for large N. Any less than CN bits, and you will not be able to encode the values of the random variables, because they are all equally likely. Any more than this, you will have extra room. This is the information in a uniform random variable.

For a general distribution, you can get the answer with a little bit of law of large numbers. If you have many copies of the random variable, the sum of the probabilities is equal to 1,

$$ P(n_1, n_2, ... , n_k) = \prod_{j=1}^N p_{n_j}$$

This probability is dominated for large N by those configurations where the number of values of type i is equal to $Np_i$, since this is the mean number of the type i's. So that the P value on any typical configuration is:

$$ P(n_1,...,n_k) = \prod_{i=1}^k p_i^{Np_i} = e^{N\sum p_i \log(p_i)}$$

So for those possibilities where the probability is not extremely small, the probability is more or less constant and equal to the above value. The total number M(N) of these not-exceedingly unlikely possibilities is what is required to make the sum of probabilities equal to 1.

$$M(N) \propto e^{ - N \sum p_i \log(p_i)}$$

To encode which of the M(N) possiblities is realized in each N picks, you therefore need a number of bits B(N) which is enough to encode all these possibilities:

$$2^{B(N)} \propto e^{ - N \sum p_i \log(p_i)}$$

which means that

$${B(N)\over N} = - \sum p_i \log_2(p_i)$$

And all subleading constants are washed out by the large N limit. This is the information, and the asymptotic equality above is the Shannon noiseless coding theorem. To make it rigorous, all you need are some careful bounds on the large number estimates.

Replica coincidences

There is another interpretation of the Shannon entropy in terms of coincidences which is interesting. Consider the probability that you pick two values of the random variable, and you get the same value twice:

$$P_2 = \sum p_i^2$$

This is clearly an estimate of how many different values there are to select from. If you ask what is the probability that you get the same value k-times in k-throws, it is

$$P_k = \sum p_i p_i^{k-1}$$

If you ask, what is the probability of a coincidence after $k=1+\epsilon$ throws, you get the Shannon entropy. This is like the replica trick, so I think it is good to keep in mind.

Entropy from information

To recover statistical mechanics from the Shannon information, you are given:

the values of the macroscopic conserved quantities (or their thermodynamic conjugates), energy, momentum, angular momentum, charge, and particle number
the macroscopic constraints (or their thermodynaic conjugates) volume, positions of macroscopic objects, etc.

Then the statistical distribution of the microscopic configuration is the maximum entropy distribution (as little information known to you as possible) on phase space satisfying the constraint that the quantities match the macroscopic quantities.

The last section is about equilibrium stat. mech, and it would be nice to explicitly acknowledge that, because there's a lot of literature on using information theory for non-equilibrium stat. mech. I used to be very confused on how the latter works, because it seemed like trying to get something for nothing --- truly non-equilibrium states can be infinitely complex. I finally realised that (via Jaynes) that one can replace the equilibrium condition with "reproducible", and in fact one always means the latter anyway. (cont) — genneth, Commented Sep 8, 2011 at 7:18
(cont) This pushes stat. mech to have a more inference-driven flavour, which probably aligns better with the OP's question. The point is that we do experiments and find out that some experimental controls are sufficient for some outcomes --- it is then simply logic that there are sufficient relationships between them to specify the macroscopic behaviour. If we then know the microscopic behaviour we can then play this game of maximum entropy and derive the statistical mechanics of the experiment. — genneth, Commented Sep 8, 2011 at 7:20
@genneth: I would do that if I thought there was a single example where this description worked. Do you know any system? The only maximal entropy distributions I know are in equilibrium stat-mech. Everywhere else, it's just a terrible zeroeth approximation. — Ron Maimon, Commented Sep 8, 2011 at 15:49
@Gerenuk: Your intuition about this is faulty, because you are used to the situation where you can see the particle, and therefore know where it is and how fast its going at all times. If you don't know where the particle is, there is an entropy associated with the particle, and the p_i are a p(x,v) to find it at any position and velocity. The laws of black hole entropy are different, and life has nothing to do with entropy (but it doesn't violate it). — Ron Maimon, Commented Sep 9, 2011 at 14:44
@Gerenuk: I know where you are wrong, and the answer is very easy and well known. It is treated completely in the Jaynes reference, and I have nothing more to add to this. The thing I put above is something which doesn't appear in many places, namely a good simple explanation of noiseless coding, because this justifies $p\log(p)$. Everything else is philosophy, and Jaynes explains it well (in the link above). — Ron Maimon, Commented Sep 10, 2011 at 3:35

Gilles 'SO- stop being evil' · Accepted Answer · 2015-12-08 21:43:23Z

The best (IMHO) derivation of the $\sum p \log p$ formula from basic postulates is the one given originally by Shannon:

Shannon (1948) A Mathematical Theory of Communication. Bell System Technical Journal. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6773024

However, Shannon was concerned not with physics but with telegraphy, so his proof appears in the context of information transmission rather than statistical mechanics. To see the relevance of Shannon's work to physics, the best references are papers by Edwin Jaynes. He wrote dozens of papers on the subject. My favorite is the admittedly rather long

Jaynes, E. T., 1979, `Where do we Stand on Maximum Entropy?' in The Maximum Entropy Formalism, R. D. Levine and M. Tribus (eds.), M. I. T. Press, Cambridge, MA, p. 15; http://bayes.wustl.edu/etj/articles/stand.on.entropy.pdf

Felix · Accepted Answer · 2011-10-29 21:25:47Z

1

The functional form of the entropy $S = - \sum p \ln p$ can be understood if one requires that entropy is extensive, and depends on the microscopic state probabilities $p$.

Consider a system $S_{AB}$ composed of two independent subsystems A and B. Then $S_{AB} = S_A +S_B$ and $p_{AB} = p_A p_B$ since A and B are decoupled.

$$ S_{AB} = - \sum p_{AB} \ln p_{AB} = -\sum p_{A} \sum p_B \ln p_A -\sum p_{A} \sum p_B \ln p_B $$

$$ = -\sum p_{A} \ln p_A - \sum p_B \ln p_B = S_A + S_B $$ This argument is valid up to a factor, which turns out to be the Boltzmann constant $k_B$ in statistical mechanics: $S = - k_B \sum p \ln p$ which is due to Gibbs, long before Shannon.

edited Oct 29, 2011 at 21:25

answered Oct 29, 2011 at 21:19

Felix

7247 silver badges8 bronze badges

$\begingroup$ Excellent answer, certainly the best since it applies to everything (Von Neumann entropy, Shannon entropy, Boltzmann-Gibbs entropy). Nevertheless you forgot an important point: you need $S$ to be also bounded and positive (for the same reason that a volume should be bounded and positive in a thermodynamic potential expansion). Otherwise, defining $S\propto \ln p$ already does the job of extensivity, the minus sign and the structure $p\ln p$ comes from the positivity requirement and the boundedness, respectively. $\endgroup$
– FraSchelle
Commented Jul 14, 2013 at 8:24

Add a comment |

Henry · Accepted Answer · 2014-01-20 20:48:07Z

Approaching this from a purely Physics perspective, this is the Gibbs entropy of a system. Firstly, although the concept of entropy can be extended we are usually discussing equilibrium thermodynamics, and this is certainly where the Gibbs entropy is first introduced.

You are of course right that technically the dynamics could be fully described by their equations of motion, but then there wouldn't really be much need for the subject of thermodynamics. I mean thermodynamics in some ways is not as "fundamental" as other subjects in physics, in that it does not try to give a complete description of everything about the system you're studying. You're usually discussing large systems (and so looking for macroscopic properties), or small systems interacting with a large environment. (for example, it doesn't make a huge amount of sense to talk about the temperature of an electron) In reality it is entirely impractical to search for a deterministic description of such systems (even without Chaos theory and quantum mechanics the number of equations would just be too enormous) and so you use thermodynamics.

With equilibrium statistical thermodynamics (which is looking for a justification of classical thermodynamics based on averages of a microscopic description), you start with the principle of equal a priori probabilities which says for an isolated system which has been left alone for along time (vague, but basically that it's in equilibrium) every microstate available to the system is equally likely to be occupied. This is a big assumption, and there are a lot of people who would like to be able to justify it properly, but it's often argued on symmetry (with the information you have there is no reason to assume one particular microstate would be more likely than any other). More than that, it just works.

The entropy of an isolated system was then postulated to be $S=k \ ln(\Omega)$ by Boltzmann where $\Omega$ is the number of microstates available to the system (it's easier to build this up assuming a discrete number of microstates, especially if you are talking about Boltzmann/Gibbs entropy). It's a postulate, but it needs to be consistent with the classical thermodynamic entropy. The Gibbs entropy is a natural extension of this when you consider systems which are in thermal contact with an environment and the microstate probabilities are no longer equal. You can show that it is consistent with the classical thermodynamic entropy for a number of systems, and really shows how entropy can be considered to be a measure of uncertainty about the microscopic details of the system.

Stack Exchange Network

How do you prove $S=-\sum p\ln p$?

4 Answers 4

Replica coincidences

Entropy from information

Not the answer you're looking for? Browse other questions tagged
thermodynamics
statistical-mechanics
entropy
or ask your own question.

Linked

Hot Network Questions

How do you prove $S=-\sum p\ln p$?

4 Answers 4

Replica coincidences

Entropy from information

Not the answer you're looking for? Browse other questions tagged thermodynamicsstatistical-mechanicsentropy or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
thermodynamics
statistical-mechanics
entropy
or ask your own question.