0
$\begingroup$

I've been self-studying "Information Geometry" by Ay et al. fascinated by the connection between Geometry, Probability and even Statistics. The proofs are clear to me, nonetheless, even in the first part of the book that considers the simplest case of probability distributions on a finite set, I'm having trouble understanding the intuition behind Chentsov's Characterization of the Fisher Metric (p. 36):

Denote by $\mathcal{P}(I)$ the space of probability distributions on the finite set $I$, by $\mathcal{P_+}(I)$ the space of strictly positive probability distributions on the finite set $I$, by $\delta^i$ the Dirac measure on the $i^{th}$ element of $I$ Consider two non-empty and finite sets $I$ and $I'$. A Markov kernel is a map $$ K:I\rightarrow\mathcal{P}(I'),\quad i\mapsto K^i:=\sum_{i'\in I'} K_{i'}^i\delta^{i'} $$ Each Markov kernel induces a corresponding map between probability distributions $$ K_*:\mathcal{P}(I)\rightarrow \mathcal{P}(I'),\quad \mu=\sum_{i\in I} \mu_i\delta^I\mapsto\sum_{i\in I}\mu_i K^I$$ Now assume that $|I|<|I'|$. We call a Markov kernel $K$ congruent if there is a partition $A_i$, $i\in I$, of $I'$, such that the following condition holds: $$K_{i'}^i> 0 \iff I'\in A_i$$ If $K$ is congruent and $\mu\in\mathcal{P_+}(I)$ then $K_\ast(\mu)\in\mathcal{P_+}(I')$. This implies a differentiable map $$K:\mathcal{P}(I)\rightarrow \mathcal{P}(I')$$ and the differential in $\mu$ is given by $$ d_{\mu}K_\ast:T_{\mu}\mathcal{P_+}(I)\rightarrow T_{K_\ast(\mu)}\mathcal{P_+}(I'),\quad (\mu,\nu-\mu)\mapsto(K_\ast(\mu),K_\ast(\nu)-K_\ast(\mu)) $$ Theorem (Chentsov's Characterization of the Fisher Metric):
We assign to each non-empty and finite set $I$ a metric $h^I$ on $\mathcal{P}_+(I)$. If for each congruent Markov kernel $K:I\rightarrow\mathcal{P}(I')$ we have invariance in the sense $$ h_p^I(A,B)=h_{K_\ast(p)}^{I'}(d_p K_\ast(A),d_p K_\ast(B)) $$ or for short $(K_\ast)^\ast(h^{I'})=h^I$, then there is a constant $\alpha>0$, such that $h^I=\alpha \mathfrak{g}^I$ for all $I$, where $\mathfrak{g}$ is the Fisher metric on $\mathcal{P_+}(I)$.
Recall that the Fisher metric is defined for each pair of vectors $A=(\mu,a),B=(\mu,b)\in T_\mu \mathcal{P_+}(I)$ as $$ \mathfrak{g_\mu}(A,B):=\sum_{i\in I}\frac{a_i b_i}{\mu_i}$$ given that $a=\sum_{i\in I} a_i e_i$, $b=\sum_{i\in I} b_i e_i$, $\mu=\sum_{i\in I} \mu_i \delta_i$ with $(e_1,e_2,...,e_n)$ is the canonical basis for the space $C(I,\mathbb{R})$ and $(\delta_1,\delta_2,...,\delta_n)$ is its dual basis, spanning the dual space, which can be identified as the space of signed measures on $I$, of which in the theorem I'm interested in, we are taking a subspace, namely the space of probability distributions on $I$.

I somewhat understand that the only metric, up to scalar multiplication, that makes every map between probability spaces be an isometry (in the Riemannian sense) is the Fisher metric. I read online and on other books that this metric is related to statistical concepts, in particular sufficient statistics. I see that any map $I\rightarrow I'$ can be seen as a statistic, but that is as far as my understanding of the links with statistics goes.

What is the role of Markov kernels in the statistical interpretation of this theorem? Why is it important in that setting that there exists a unique metric that is preserved by those induced maps? What would even be "measured" by such metric, on manifolds made of probability distributions?

$\endgroup$

1 Answer 1

1
$\begingroup$

We can argue that Čencov's idea was to show that the Fisher-Rao metric tensor is the only one (up to a constant factor) that respects a particular type of symmetry which is connected with the inner structure of probability measures.

Concretely, the manifolds appearing in (classical) information geometry are not "naked manifolds" because their points parameterize probability measures. Therefore, in order to "preserve this information", the smooth maps we have to be concerned with are those coming from maps between the ambient probability spaces. In turn, probabilities form convex sets, so it is only reasonable that the maps between probability we consider preserve convexity. Markov Kernel are precisely those maps between probability spaces that preserve convexity.

However, not all Markov kernels generate maps we are interested in. Indeed, we would like to be able to somehow invert these maps, even if only in one direction. Therefore, we want Markov kernels admitting a left inverse which is also a Markov kernel. Čencov calls these maps congruent embeddings. Essentially, they are connected with conditional expectations.

It is important to note that, if we focus on invertible Markov kernels, that is, Markov kernels admitting left and right inverses which are Markov kernels, then Čencov's theorem fails already in the case of strictly positive probability measures on a 2-outcome set. Indeed, in this case, invertible Markov kernels are permutations, and there are infinitely many metric tensors on the (open interior of the) 2-simplex which are invariant with respect to permutations.

Concerning what the Fisher-Rao metric tensor "measures", you can give a look at my answer to this question. Bear in mind, however, that the extension of Čencov's uniqueness result to the case where the outcome space on which the probability measures live is continuous requires a lot of care.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .