I've been self-studying "Information Geometry" by Ay et al. fascinated by the connection between Geometry, Probability and even Statistics. The proofs are clear to me, nonetheless, even in the first part of the book that considers the simplest case of probability distributions on a finite set, I'm having trouble understanding the intuition behind Chentsov's Characterization of the Fisher Metric (p. 36):
Denote by $\mathcal{P}(I)$ the space of probability distributions on the finite set $I$, by $\mathcal{P_+}(I)$ the space of strictly positive probability distributions on the finite set $I$, by $\delta^i$ the Dirac measure on the $i^{th}$ element of $I$ Consider two non-empty and finite sets $I$ and $I'$. A Markov kernel is a map $$ K:I\rightarrow\mathcal{P}(I'),\quad i\mapsto K^i:=\sum_{i'\in I'} K_{i'}^i\delta^{i'} $$ Each Markov kernel induces a corresponding map between probability distributions $$ K_*:\mathcal{P}(I)\rightarrow \mathcal{P}(I'),\quad \mu=\sum_{i\in I} \mu_i\delta^I\mapsto\sum_{i\in I}\mu_i K^I$$ Now assume that $|I|<|I'|$. We call a Markov kernel $K$ congruent if there is a partition $A_i$, $i\in I$, of $I'$, such that the following condition holds: $$K_{i'}^i> 0 \iff I'\in A_i$$ If $K$ is congruent and $\mu\in\mathcal{P_+}(I)$ then $K_\ast(\mu)\in\mathcal{P_+}(I')$. This implies a differentiable map $$K:\mathcal{P}(I)\rightarrow \mathcal{P}(I')$$ and the differential in $\mu$ is given by $$ d_{\mu}K_\ast:T_{\mu}\mathcal{P_+}(I)\rightarrow T_{K_\ast(\mu)}\mathcal{P_+}(I'),\quad (\mu,\nu-\mu)\mapsto(K_\ast(\mu),K_\ast(\nu)-K_\ast(\mu)) $$ Theorem (Chentsov's Characterization of the Fisher Metric):
We assign to each non-empty and finite set $I$ a metric $h^I$ on $\mathcal{P}_+(I)$. If for each congruent Markov kernel $K:I\rightarrow\mathcal{P}(I')$ we have invariance in the sense $$ h_p^I(A,B)=h_{K_\ast(p)}^{I'}(d_p K_\ast(A),d_p K_\ast(B)) $$ or for short $(K_\ast)^\ast(h^{I'})=h^I$, then there is a constant $\alpha>0$, such that $h^I=\alpha \mathfrak{g}^I$ for all $I$, where $\mathfrak{g}$ is the Fisher metric on $\mathcal{P_+}(I)$.
Recall that the Fisher metric is defined for each pair of vectors $A=(\mu,a),B=(\mu,b)\in T_\mu \mathcal{P_+}(I)$ as $$ \mathfrak{g_\mu}(A,B):=\sum_{i\in I}\frac{a_i b_i}{\mu_i}$$ given that $a=\sum_{i\in I} a_i e_i$, $b=\sum_{i\in I} b_i e_i$, $\mu=\sum_{i\in I} \mu_i \delta_i$ with $(e_1,e_2,...,e_n)$ is the canonical basis for the space $C(I,\mathbb{R})$ and $(\delta_1,\delta_2,...,\delta_n)$ is its dual basis, spanning the dual space, which can be identified as the space of signed measures on $I$, of which in the theorem I'm interested in, we are taking a subspace, namely the space of probability distributions on $I$.
I somewhat understand that the only metric, up to scalar multiplication, that makes every map between probability spaces be an isometry (in the Riemannian sense) is the Fisher metric. I read online and on other books that this metric is related to statistical concepts, in particular sufficient statistics. I see that any map $I\rightarrow I'$ can be seen as a statistic, but that is as far as my understanding of the links with statistics goes.
What is the role of Markov kernels in the statistical interpretation of this theorem? Why is it important in that setting that there exists a unique metric that is preserved by those induced maps? What would even be "measured" by such metric, on manifolds made of probability distributions?