0
$\begingroup$

I am interested in a canonical information geometry on spaces of probability distributions containing distributions with different parameter spaces. Let me give some context and practical motivation from the problem I am considering, and what might be directions toward a solution:

Motivation: In Bayesian statistics and machine learning one fundamental problem is to find a generative model for some observed data $d$ (living say in $\mathbb R^N$ for simplicity). This is a probability distribution over two variables $o,s$ over, say $\mathbb R^N$ and $\mathbb R^D$, respectively. When one fixes $D$ the space of all generative models one could consider is the set of all probability distributions on $\mathbb R^{N}\times \mathbb R^D$. In that case the canonical information geometry on this space is given by the Fisher information metric, because Chentsov's theorem tells us that it is the unique Riemannian metric (on a statistical manifold) that is invariant under sufficient statistics.

The problem: $o$ is the so-called observed variable, living in the space of the data which is given a priori, and $s$ is the so-called latent or hidden variable which lives in a space whose nature (dimension etc) one is, in principle, free to specify. In practice, one often considers various generative models where the space that $s$ lives in may differ from one model to another. For instance, one could consider $p(o^{(i)}, s^{(i)})$, for some index set of $i$, where $o^{(i)}$ is a random variable living in $\mathbb R^N$ and $s^{(i)}$ is a random variable on $\mathbb R^{D(i)}$. I would be interested in knowing about whether there is a canonical way of computing distances in such a space of probability distributions, as in the case where $D(i)$ is constant above. It seems like the Fisher information metric (which derives from the KL divergence) would be infinite between generative models whose latent dimension is different.

The specific application I have in mind is when one considers generative models in the form of (acyclic) probabilistic graphical models: in this case, I am interested in a canonical way (if it exists) to compute information distances between graphical models with a potentially different number of latent factors. I suppose the Fisher information geometry is meant to quantify distances between distributions over the same parameter space, and here I am specifically considering distributions over different parameter spaces. Perhaps one needs a "hierarchy" of distances, where a "graphical information distance" quantifies the distance between two graphical models in terms of how different the information carried by their graphical structure is (regardless of the respective parameterisations)---in particular, this distance would be zero for generative models that have the same graphical structure---and the Fisher information distance would serve to quantify the parameter difference in information for two generative models that have the same graphical structure, and would be infinite for generative models with a different graphical structure. If this is the right approach then I would be looking for an analog of the Fisher information distance for different graphical structures. Does this exist, and have hierarchical geometries defined along similar principles been explored in the literature? Another way to frame this question would be about an information distance between different statistical manifolds.

Of course in the case of generative models one can usually measure the empirical distance between samples in applications, using a suitable choice of divergence, however, I suppose this would not reflect on the difference in the internal structure of the models, which is what I am primarily interested in quantifying.

Addendum: I have read briefly about stratified spaces, which are spaces whose dimension may change at every point. A popular stratified space is the space of phylogenetic trees, which, heuristically, looks similar to a space of tree-like acyclic graphical models. The space of phylogenetic trees which a fixed set of leaves has a well defined geometry [1], and I don't know whether it in any way relates to measuring the difference in information content. Perhaps there is more recent research in this area for spaces of phylogenetic trees with varying numbers of leaves, which may then be directly applicable to the problem of computing distances between acyclic probabilistic graphical models with different graphical structures.

[1] L. J. Billera, S. P. Holmes, and K. Vogtmann, Advances in Applied Mathematics 27, 733 (2001).

$\endgroup$
2
  • $\begingroup$ (i) What do you mean by "a canonical wayl"? (ii) Are there any reasons to believe that such a way exists? $\endgroup$ Commented Jun 29, 2023 at 13:49
  • $\begingroup$ @IosifPinelis I explained how the Fisher information metric is canonical and that I would be looking for something that is canonical in an analogous way. Whether that exists or could exist is part of my question, I simply don't know $\endgroup$
    – Lance
    Commented Jun 29, 2023 at 14:07

0