I've been trying to understand the motivation for the use of the Jeffreys prior in Bayesian statistics. Most texts I've read online make some comment to the effect that the Jeffreys prior is "invariant with respect to transformations of the parameters", and then go on to state its definition in terms of the Fisher information matrix without further motivation. However, none of them then go on to show that such a prior is indeed invariant, or even to properly define what was meant by "invariant" in the first place.
I like to understand things by approaching the simplest example first, so I'm interested in the case of a binomial trial, i.e. the case where the support is $\{1,2\}$. In this case the Jeffreys prior is given by $$ \rho(\theta) = \frac{1}{\pi\sqrt{\theta(1-\theta)}}, \qquad\qquad(i) $$ where $\theta$ is the parameterisation given by $p_1 = \theta$, $p_2 = 1-\theta$.
What I would like is to understand the sense in which this is invariant with respect to a coordinate transformation $\theta \to \varphi(\theta)$. To me the term "invariant" would seem to imply something along the lines of $$ \int_{\theta_1}^{\theta_2} \rho(\theta) d \theta = \int_{\varphi(\theta_1)}^{\varphi(\theta_2)} \rho(\varphi(\theta)) d \varphi \qquad\qquad(ii) $$ for any (smooth, differentiable) function $\varphi$ -- but it's easy enough to see that this is not satisfied by the distribution $(i)$ above (and indeed, I doubt there can be any density function that does satisfy this kind of invariance for any transformation). So there must be some other sense intended by "invariant" in this context. I would like to understand this sense in the form of a functional equation similar to $(ii)$, so that I can see how it's satisfied by $(i)$.
Progress
As did points out, the Wikipedia article gives a hint about this, by starting with $$ p(\theta)\propto\sqrt{I(\theta)} $$ and deriving $$ p(\varphi)\propto\sqrt{I(\varphi)} $$ for any smooth function $\varphi(\theta)$. (Note that these equations omit taking the Jacobian of $I$ because they refer to a single-variable case.) Clearly something is invariant here, and it seems like it shouldn't be too hard to express this invariance as a functional equation. However, the more I try to do this the more confused I get. Partly this is because there's just a lot left out of the Wikipedia sketch (e.g. are the constants of proportionality the same in the two equations above, or different? Where is the proof of uniqueness?) but mostly it's because it's really unclear exactly what's being sought, which is why I wanted to express it as a functional equation in the first place.
To reiterate my question, I understand the above equations from Wikipedia, and I can see that they demonstrate an invariance property of some kind. However, I can't see how to express this invariance property in the form of a functional equation similar to $(ii)$, which is what I'm looking for as an answer to this question. I want to first understand the desired invariance property, and then see that the Jeffrey's prior (hopefully uniquely) satisfies it, but the above equations mix up those two steps in a way that I can't see how to separate.