1
$\begingroup$

In section 5.1 of Amari's book, when discussing statistical manifolds, the author states that, given a manifold whose points are probability distributions, one can identify tangent vectors $\boldsymbol e_i$ with the score functions, that is, $$\boldsymbol e_i \approx \partial_i \log p(x,\boldsymbol\xi).$$ Here $\boldsymbol\xi$ is the parameter defining the manifold, and $x$ labels the possible outcomes. I understand why tangent vectors should be functions of $x$, as in this context points are probability distributions, that is, functions $x\mapsto p(x,\boldsymbol\xi)$. However, I don't quite understand where this particular expression for the tangent vectors comes from. In fact, naively, at least for discrete distributions, I would have guessed tangent vectors to simply have the form $\partial_i p(x,\boldsymbol\xi)$, without the normalization factor arising from the log derivative.

This expression seems to be compatible with the Fisher information metric, introduced shortly thereafter as $$\langle\boldsymbol e_i,\boldsymbol e_j\rangle = \mathbb{E}[\partial_i \log p(x,\boldsymbol \xi)\partial_j \log p(x,\boldsymbol\xi)],$$ but if I understand the context correctly, this seems a bit backwards: the metric should comes from the expression for tangent vectors.

A possible solution to the conundrum is that what the author means is that one can choose a chart for the manifold with respect to which tangent vectors have this particular expression. This doesn't seem to be stated explicitly in the text, however. Is this correct, and if so, how can one see it explicitly?

$\endgroup$

1 Answer 1

2
$\begingroup$

Your last intuition is indeed correct, but a rigorous treatment of the subject when you have parametric models of probability distributions on a non-discrete space is quite non-trivial, and seldom discussed with detail in the literature (I suggest you consult this book to get a more rigorous treatment).

I will follow standard practice and be not very rigorous :-)

Roughly speaking, we may think of a tangent vector as the initial tangent vector to a curve. In the case we have a parametric model of probabilities, we may consider a curve $p(x,\xi(t))$ described as a curve of functions. Since we want a curve of probabilities, we must ensure that $$ \int_{X}p(x,\xi(t))\mathrm{d}\mu(x)=1 $$ for all $t$, so that, taking the derivative w.r.t. the variable $t$ and setting $t=0$, we obtain $$ \int_{X}\dot{p}(x,\xi)\mathrm{d}\mu(x)=0. $$ A tangent vector is thus identified with a function $\dot{p}(x,\xi)$ satisfying the previous equation. If we now consider a curve of the form $$ p(x,\xi(t))=\mathrm{e}^{\ln(p(x,\xi)) + v(x,\xi(t))}=, $$ we obtain $$ \dot{p}(x,\xi)=p(x,\xi)\,\dot{v}(x,\xi). $$ Assuming $p>0$, we obtain $$ \dot{v}(x,\xi)=\frac{\dot{p}(x,\xi)}{p(x,\xi)}=\frac{\mathrm{d}}{\mathrm{d}t}(\ln(p(x,\xi(t)))_{t=0}, $$ which is basically the identification mentioned in your question.

$\endgroup$
2
  • $\begingroup$ so in practice, if I understand correctly, you're parametrising the curves as $p(x,\xi(t))=p(x,\xi) e^{v(x,\xi(t))}$, sorta "off-loading" the time-dependence to a multiplicative factor. I suppose you can always do this, assuming nonzero probabilities. And then you map into an isomorphic manifold whose tangent vectors look like $\dot v$ I guess? Makes sense but I'm still fuzzy about the formal details. I'll check out the book, thanks $\endgroup$
    – glS
    Commented May 27 at 9:24
  • $\begingroup$ Your intuition is correct! $\endgroup$
    – fmc2
    Commented Jun 21 at 15:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .