The Shannon entropy is the average of the negative log of a list of probabilities $ \{ x_1 , \dots , x_d\} $, i.e. $$ H(x)= -\sum\limits_{i=1}^d x_i \log x_i $$ there are of course lots of nice interpretations of the Shannon entropy. What about the variance of $ -\log x_i $ ? $$ \sigma^2 (-\log x)=\sum\limits_i x_i (\log x_i )^2-\left( \sum\limits_i x_i \log x_i \right)^2 $$ does this have any meaning / has it been used in the literature?
2 Answers
$\log 1/x_i$ is sometimes known as the 'surprise' (e.g. in units of bits) of drawing the symbol $x_i$, and $\log 1/X$, being a random variable, has all the operational meanings that come with any random variable, namely, entropy is the average 'surprise'; similarly, higher moments are simply higher moments of the surprise measure of $X$.
There is indeed a literature on using the variance of information measures (not of surprise in this case, but of divergence), here are two good places to get started on a concept called 'dispersion': http://people.lids.mit.edu/yp/homepage/data/gauss_isit.pdf http://arxiv.org/pdf/1109.6310v2.pdf
The application is clear. When you only know the expected value of a random variable, you know it at first order. But when you need to get tighter bounds you need to use higher moments.
More generally, this paper does talk about higher moments of information (though there does not seem to be that much follow-up work on this):
H. Jürgensen, D. E. Matthews, "Entropy and Higher Moments of Information", Journal of Universal Computer Science vol 16, nr. 5 (2010)
Link here: http://www.jucs.org/jucs_16_5/entropy_and_higher_moments/jucs_16_05_0749_0794_juergensen.pdf