8
$\begingroup$

I know that there is a strong relationship between Shannon entropy and thermodynamic entropy -- they even have the same units and differ only by a constant factor. This suggests that they both intrinsically describe the same fundamental concept.

Wikipedia says that there is a strong relationship between Fisher information and relative entropy (also called Kullback-Leibler divergence), as does an answer to a previous question on Math.SE.

However, looking at the relevant formulas, it does not look like Fisher information would be measured with the same units that relative entropy would. This suggests that they are measuring fundamentally distinct, albeit related, physical concepts.

The formula for the Shannon entropy can be written as follows: $$\int [ - \log p(x) ]\ p(x) \, dx $$ This is usually measured in bits.

What are the units of Fisher information (given that Shannon entropy can be measured in bits)?

Fisher information can be written as: $$\int \left(\frac{\partial}{\partial \theta} \log p(x; \theta) \right)^2 p(x;\theta) \, dx $$

My guess, based on comparing the definitions of Shannon entropy and Fisher information, is that the latter would be measured in units something like $$\frac{\text{bit}^2}{\Theta^2} $$ where $\Theta$ is the unit of measurement of the parameter $\theta$ that is to be estimated.

I am not quite sure how to account for the effects of the extra partial differentiation compared to the definition of Shannon entropy. Perhaps the expectation operation $\int ( \cdot) p(y) \, dy$ should leave the units unchanged, although I don't know how to give a non-intuitive explanation of this suspicion.

Since the Fisher information is the variance of the score, this question might be answered by first deriving the units of the score.

This question might be related, although it was unanswered.

$\endgroup$
4
  • 1
    $\begingroup$ $$ \int \left( { \frac{\partial}{ \partial \theta} } \log p(x; \theta) \right)^2 p(x;\theta) \, dx$$ Qwerty's answer is clearly correct, but this raises a question: Normally the units of $p(x)$ would be the reciprocal of those of $x$, so that $p(x)\,dx$ is dimensionless. But how, then, can we take a logarithm of $p(x)$? It must mean the logarithm of the quotient of $p(x)$ by some "constant" (where "constant" would mean not depending on $x$?) and it must turn out that the integral as a whole does not depend on that constant, and perhaps the partial derivative itself does not. $\qquad$ $\endgroup$ Commented Aug 16, 2016 at 12:54
  • $\begingroup$ @MichaelHardy Maybe Landauer's principle gives a constant with units? I.e. a "bit" actually corresponds to Joules: en.wikipedia.org/wiki/Landauer%27s_principle#Equation. If we assumed that that is the case, then would the units of Fisher information be $$\frac{Joules^2}{ \Theta^2} ?$$ $\endgroup$ Commented Aug 16, 2016 at 15:23
  • 1
    $\begingroup$ I may return to this thread later. But can I convince you to write $\dfrac{ \text{Joules}^2}{ \Theta^2}$ instead of $\dfrac{Joules^2}{ \Theta^2} \text{?} \qquad$ $\endgroup$ Commented Aug 16, 2016 at 15:31
  • $\begingroup$ @MichaelHardy haha of course -- I was just being lazy and didn't put it in \text or \mathrm $\endgroup$ Commented Aug 16, 2016 at 15:32

1 Answer 1

7
$\begingroup$

The Cramer Rao Lower bound says

$ \operatorname{Var}(T(X))\ge {1\over I(\theta)}.$ $\ T(X) $ is an unbiased estimator of the parameter $\theta$ and $I(\theta)$ is the Fisher Information.

As only quantities of the same unit can be compared, and $\operatorname{Var}(T)$ has unit $\Theta^2$ $\therefore$ $I(\theta)$ has unit $\Theta^{-2}$

$\endgroup$
6
  • $\begingroup$ Thanks so much for your help! What is $T(X)$ or $T$? And why are there no units of bits even though there is a $\log f(x, \theta)$ term? $\endgroup$ Commented Aug 15, 2016 at 13:10
  • 2
    $\begingroup$ $T(X)$ is an unbiased estimator of the parameter $\theta$. $\endgroup$
    – Qwerty
    Commented Aug 15, 2016 at 13:13
  • 2
    $\begingroup$ @William Well, my bad: I don't get how you are comparing $\log$ with bits. $\endgroup$
    – Qwerty
    Commented Aug 15, 2016 at 13:14
  • 2
    $\begingroup$ That's fair, I haven't really been clear. I guess what I'm trying to say is that entropy (at least in the discrete case, which I understand much better) has the formula $-\log(p_i)$ and is measured in bits. I suppose (differential) entropy might be measured in bits as well. So I guess what I was trying to say was that since $f(x,\theta)$ is a probability, seemingly $\log f(x,\theta)$ in the definition of Fisher information should be measured in bits, which is where my unit of bits comes from in my question. $\endgroup$ Commented Aug 15, 2016 at 13:35
  • 2
    $\begingroup$ Bits are unitless, since, for example, if you change the unit of measurement to a unit that is $1/3$ as big, you don't get $3^f$ times as many bits, for same exponent $f$. $\qquad$ $\endgroup$ Commented Aug 16, 2016 at 12:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .