5
$\begingroup$

If I am correct, Fisher's information at parameter $\theta$ is defined to be the variance of the score function at $\theta$. The score function is defined as the derivative of the log-likelhood function wrt $\theta$, and therefore measures the sensitivity of the log-likelihood function wrt $\theta$.

I was wondering how to understand the meaning of Fisher's information?

Especially, why does Wikipedia say:

The Fisher information is a way of measuring the amount of information that an observable random variable $X$ carries about an unknown parameter $θ$ upon which the probability of $X$ depends.

What kind of information is in "the amount of information"? Shannon information, no?

Why is the "information" carried by $X$ about $\theta$?

Thanks and regards!

$\endgroup$

2 Answers 2

10
$\begingroup$

"Information" is an abstract concept that may be quantified in a number of different ways. Shannon's approach was to compress the data as much as possible and then to count the number of bits needed in the most compressed form. Fisher's approach is radically different and is closer to what laymen intuitively think. If I give you data on death rate of rats in China and ask you to estimate the population of Cuba based on that, you'll surely say that the data contains no information about the quantity to be estimated. Generalizing this, information may be quantified as follows: Try your "best" to estimate the quantity of interest based on the data, see how "well" you have performed. A natural choice for "best" is maximum likelihood estimation (MLE). A natural choice for "well" is to consider the variance of the MLE. Smaller the variance, more the "information". So consider 1/variance. If sample size is large then its limiting behavior gives you Fisher info.

$\endgroup$
1
$\begingroup$

Just as an example, let's say we have a uniformly distributed random variable $X$. Then $X$ is dependant on two parameters, where max and min, or average and span are the usual ones. If you have observed $X$ you can say something about those parameters, just from what $X$-s you have observed.

Say we have observed the following values of the uniformly distributed integer random variable $X$: $$ 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0 $$ Wouldn't you agree that we can with some certainty conclude that $\max(X)=1$ and $\min(X) = 0$?

To show an actual Fisher information example, let's instead say that the random variable $X$ is either $0$ with some probability $\theta$ or $1$ with probability $(1-\theta)$. Thus $f_X(0;\theta) = \theta$ and $f_X(1;\theta) = 1-\theta$ The Fisher information of $\theta$ is the value $$ \mathcal I(\theta) =E\left[\left(\frac{\partial}{\partial \theta}f_X(X;\theta)\right)^2\Bigg |\theta\right] = \left(\frac{\partial}{\partial \theta}\ln f_X(0;\theta)\right)^2f_X(0;\theta) + \left(\frac{\partial}{\partial \theta}\ln f_X(1;\theta)\right)^2f_X(1;\theta) \\\\ = \frac{1}{\theta^2}\cdot \theta + \frac{1}{(1-\theta)^2}(1-\theta) = \frac{1}{\theta(1-\theta)} $$ and this function measures how much information observations of $X$ gives about $\theta$. According to wikipedia a function with large values means observations give much information. Together with the observed most likely estimate of $\theta$ as $0.5$, we get that $\mathcal I(0.5) = 4$. I do not have enough experience with Fisher information to tell you if this specific value is "large".

$\endgroup$
7
  • $\begingroup$ How does this explain what Fisher information is? $\endgroup$
    – joriki
    Commented May 4, 2013 at 17:00
  • $\begingroup$ @joriki It doesn't, and frankly, I don't see OP asking what Fisher information is. As I read his question, he wants to know how information about parameters is at all carried by observations of the variables themselves. $\endgroup$
    – Arthur
    Commented May 4, 2013 at 17:02
  • $\begingroup$ Thanks, Arthur. Is Fisher's information used in your example? $\endgroup$
    – Tim
    Commented May 4, 2013 at 17:15
  • $\begingroup$ @Tim It is now. $\endgroup$
    – Arthur
    Commented May 4, 2013 at 19:22
  • 1
    $\begingroup$ @Tim This is all part of an analytic way to try to figure out what $\theta$ really is. You can first do what is called "maximum likelyhood estimate", which is basically saying "what value of $\theta$ gives highest probability of yielding the observed data?" That gives you a function of $\theta$, and maximum likelihood estimate is the $\theta$ that gives you the maximum of that function.Fisher information is a way to measure how sharp that peak is. If $\mathcal I(\theta)$ is large, the peak is sharp, and values of $\theta$ close to the estimate are less likely to be the true value. $\endgroup$
    – Arthur
    Commented May 4, 2013 at 19:41

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .