Differential Entropy

Question

I'm a little temporarily confused about the concept of differential entropy. It says on wikipedia that the differential entropy of a Gaussian is $\log(\sigma\sqrt{2\pi e})$. However I was thinking as $\sigma \rightarrow 0^+$, the only intuitive value for an entropy to me seems to be 0. We are then 100% sure that the outcome will be equal to $\mu$, and nothing is required to store the knowledge of what the outcome will be. Instead the expression above gives $-\infty$.

So I must be misunderstanding something, right?

Just to clarify, the reason why I'm asking is that I'm trying to figure out if my approach at this question about Empirical Entropy makes any sense.

Edit: "Own work"

Now I have thought a bit about this. If we take the easiest distribution, the uniform distribution, which (according to wikipedia) has differential entropy $\log(b-a)$, say $b-a = 2^k$.

If k = -1, this would be -1 bit and the interval would be length 0.5.

If k = 0, this would be 0 bit and the interval would be length 1.0.

If k = 1, this would be 1 bit and the interval would be length 2.0.

So if the interval is 0.5, we would "save" one bit, as compared to if we had to store the precision of an interval of length 1. So differential entropy is in some sense the information needed "in excess" to whatever resolution we want to store with. Does this make any sense?

dioid · Accepted Answer · 2015-11-22 12:48:40Z

Yes, the differential entropy disregards resolution (quantization). A continuous random variable can't be represented exactly with finite number of bits. By introducing an approximate representation through quantization you can relate to classical entropy. For example if you quantize uniformly with intervals of length $\Delta=2^{-n}$ you get a discrete random variable with $p_i=\int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x)dx$ and entropy $$-\sum_i p_i \log p_i = -\sum_i \int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x) dx \log (f(\xi_i)\Delta) = \\ -\sum_i \int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x)\log(f(\xi_i)) dx - \log \Delta$$ where first term is approximated by differential entropy when quantization is fine.

You may consider a uniform on 0 to $2^m$ to see that differential entropy is $m$ and quantization with $\Delta = 2^{-n}$ gives entropy $m+n$ so intuitively the differential entropy gives the bits to cover the "spread" and the resolution $n$ in addition to cover the "precision". This assumes $m+n \geq 0$ to let quantization interval cover range.

In your initial example when $\sigma \to 0$ the fine quantization assumption will eventually be invalid but if you back up to exact expression entropy will be 0 as expected.

leonbloy · Accepted Answer · 2015-08-15 23:12:49Z

2

The differential entropy $h(x)$ is not a true generalization of the (discrete, true) entropy $H(X)$, only some of the properties of the later apply to the former. In particular, the property that $H(X)\ge 0$ , with $H(X)=0$ meaning "zero uncertainty" (or full knowledge), does not apply to $h(x)$. The differential entropy can be negative, and $h(x)=0$ has no special significance.

edited Aug 15, 2015 at 23:12

answered Aug 15, 2015 at 19:17

leonbloy

64.4k10 gold badges75 silver badges160 bronze badges

$\begingroup$ can you refer a textbook that discusses zero and $-\infty$ $h(x)$ versus zero $H(X)$? $\endgroup$
– develarist
Commented Oct 8, 2020 at 15:29

Add a comment |

Stack Exchange Network

Differential Entropy

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability-theory
statistics
information-theory
entropy
.

Linked

Hot Network Questions

Differential Entropy

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probability-theorystatisticsinformation-theoryentropy.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability-theory
statistics
information-theory
entropy
.