52
$\begingroup$

What is an intuitive proof of the multivariable changing of variables formula (Jacobian) without using mapping and/or measure theory?

I think that textbooks overcomplicate the proof.

If possible, use linear algebra and calculus to solve it, since that would be the simplest for me to understand.

$\endgroup$
10
  • 2
    $\begingroup$ If there was a simpler proof, don't you think the books would use it? $\endgroup$
    – Potato
    Commented Dec 29, 2012 at 20:47
  • 1
    $\begingroup$ @Potato - Couldn't the author also give the intuitions? $\endgroup$
    – Victor
    Commented Dec 29, 2012 at 20:49
  • $\begingroup$ What exactly do you want? A different proof, or an intuitive explanation of the standard proof (say, the one that is in Folland). $\endgroup$
    – Potato
    Commented Dec 29, 2012 at 20:50
  • 3
    $\begingroup$ A lengthy proof of the change of variables formula for Riemann integrals in $\mathbb R^n$ (that does not use measure theory) is given in Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach by Hubbard and Hubbard. A discussion of the intuition behind it is given on page 493. $\endgroup$
    – Potato
    Commented Dec 29, 2012 at 21:04
  • 2
    $\begingroup$ @Tim A proof for Lebesgue integrals can be found in any standard book on measure theory and integration, including Folland's book. $\endgroup$
    – Potato
    Commented Dec 29, 2012 at 21:14

5 Answers 5

58
$\begingroup$

To do it for a particular number of variables is very easy to follow. Consider what you do when you integrate a function of x and y over some region. Basically, you chop up the region into boxes of area ${\rm d}x{~\rm d} y$, evaluate the function at a point in each box, multiply it by the area of the box. This can be notated a bit sloppily as:

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

What you do when changing variables is to chop the region into boxes that are not rectangular, but instead chop it along lines that are defined by some function, call it $u(x,y)$, being constant. So say $u=x+y^2$, this would be all the parabolas $x+y^2=c$. You then do the same thing for another function, $v$, say $v=y+3$. Now in order to evaluate the expression above, you need to find "area of box" for the new boxes - it's not ${\rm d}x~{\rm d}y$ anymore.

As the boxes are infinitesimal, the edges cannot be curved, so they must be parallelograms (adjacent lines of constant $u$ or constant $v$ are parallel.) The parallelograms are defined by two vectors - the vector resulting from a small change in $u$, and the one resulting from a small change in $v$. In component form, these vectors are ${\rm d}u\left\langle\frac{\partial x}{\partial u}, ~\frac{\partial y}{\partial u}\right\rangle $ and ${\rm d}v\left\langle\frac{\partial x}{\partial v}, ~\frac{\partial y}{\partial v}\right\rangle $. To see this, imagine moving a small distance ${\rm d}u$ along a line of constant $v$. What's the change in $x$ when you change $u$ but hold $v$ constant? The partial of $x$ with respect to $u$, times ${\rm d}u$. Same with the change in $y$. (Notice that this involves writing $x$ and $y$ as functions of $u$, $v$, rather than the other way round. The main condition of a change in variables is that both ways round are possible.)

The area of a paralellogram bounded by $\langle x_0,~ y_0\rangle $ and $\langle x_1,~ y_1\rangle $ is $\vert y_0x_1-y_1x_0 \vert$, (or the abs value of the determinant of a 2 by 2 matrix formed by writing the two column vectors next to each other.)* So the area of each box is

$$\left\vert\frac{\partial x}{\partial u}{\rm d}u\frac{\partial y}{\partial v}{\rm d}v - \frac{\partial y}{\partial u}{\rm d}u\frac{\partial x}{\partial v}dv\right\vert$$

or

$$\left\vert \frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v}\right\vert~{\rm d}u~{\rm d}v$$

which you will recognise as being $\mathbf J~{\rm d}u~{\rm d}v$, where $\mathbf J$ is the Jacobian.

So, to go back to our original expression

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

becomes

$$\sum_{b \in \text{Boxes}} f(u, v) \cdot \mathbf J \cdot {\rm d}u{\rm d}v$$

where $f(u, v)$ is exactly equivalent to $f(x, y)$ because $u$ and $v$ can be written in terms of $x$ and $y$, and vice versa. As the number of boxes goes to infinity, this becomes an integral in the $uv$ plane.

To generalize to $n$ variables, all you need is that the area/volume/equivalent of the $n$ dimensional box that you integrate over equals the absolute value of the determinant of an n by n matrix of partial derivatives. This is hard to prove, but easy to intuit.


*to prove this, take two vectors of magnitudes $A$ and $B$, with angle $\theta$ between them. Then write them in a basis such that one of them points along a specific direction, e.g.:

$$A\left\langle \frac{1}{\sqrt 2}, \frac{1}{\sqrt 2}\right\rangle \text{ and } B\left\langle \frac{1}{\sqrt 2}(\cos(\theta)+\sin(\theta)),~ \frac{1}{\sqrt 2} (\cos(\theta)-\sin(\theta))\right\rangle $$

Now perform the operation described above and you get $$\begin{align} & AB\cdot \frac12 \cdot (\cos(\theta) - \sin(\theta)) - AB \cdot 0 \cdot (\cos(\theta) + \sin(\theta)) \\ = & \frac 12 AB(\cos(\theta)-\sin(\theta)-\cos(\theta)-\sin(\theta)) \\ = & -AB\sin(\theta) \end{align}$$

The absolute value of this, $AB\sin(\theta)$, is how you find the area of a parallelogram - the products of the lengths of the sides times the sine of the angle between them.

$\endgroup$
3
  • 1
    $\begingroup$ Welcome to math stackexchange. I liked your answer so I marked it up in latex, but please learn latex for future posts. You can see the latex by right clicking on a formula and selecting "show math as", "tex commands". $\endgroup$
    – DanielV
    Commented Jan 18, 2016 at 23:02
  • $\begingroup$ Best intuitive explanation on the subject I met $\endgroup$
    – John
    Commented Jul 16, 2017 at 11:44
  • $\begingroup$ There is this proof with pictures of two regions (Domains) in the textbook: "Larson R.,Edwards B.H.Calculus. Early transcendentals. 5Ed." on p.1047 but it is not proof that both areas (or integrals) are the same (remember continuum mechanics where they can differ because of deformation). $\endgroup$ Commented Oct 30, 2018 at 7:43
49
$\begingroup$

The multivariable change of variables formula is nicely intuitive, and it's not too hard to imagine how somebody might have derived the formula from scratch. However, it seems that proving the theorem rigorously is not as easy as one might hope.

Here's my attempt at explaining the intuition -- how you would derive or discover the formula.

The first thing to understand is that if $A$ is an $N \times N$ matrix with real entries and $S \subset \mathbb R^N$, then $$ \tag{1} m(AS) = |\det A| \, m(S). $$ Here $m(S)$ is the area of $S$ (if $N=2$) or the volume of $S$ (if $N=3$) or more generally the Lebesgue measure of $S$. Technically I should assume that $S$ is measurable. The above equation (1) is intuitively clear from the SVD of $A$: \begin{equation} A = U \Sigma V^T \end{equation} where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal with nonnegative diagonal entries. Multiplying by $V^T$ doesn't change the measure of $S$. Multiplying by $\Sigma$ scales along each axis, so the measure gets multiplied by $\det \Sigma = | \det A|$. Multiplying by $U$ doesn't change the measure.

Next suppose $\Omega$ and $\Theta$ are open subsets of $\mathbb R^N$ and suppose $g:\Omega \to \Theta$ is $1-1$ and onto. We should probably assume $g$ and $g^{-1}$ are $C^1$ just to be safe. (Since we're just seeking an intuitive derivation of the change of variables formula, we aren't obligated to worry too much about what assumptions we make on $g$.) Suppose also that $f:\Theta \to \mathbb R$ is, say, continuous (or whatever conditions we need for the theorem to actually be true).

Partition $\Theta$ into tiny subsets $\Theta_i$. For each $i$, let $u_i$ be a point in $\Theta_i$. Then \begin{equation} \int_{\Theta} f(u) \, du \approx \sum_i f(u_i) m(\Theta_i). \end{equation}

Now let $\Omega_i = g^{-1}(\Theta_i)$ and $x_i = g^{-1}(u_i)$ for each $i$. The sets $\Omega_i$ are tiny and they partition $\Omega$. Then \begin{align} \sum_i f(u_i) m(\Theta_i) &= \sum_i f(g(x_i)) m(g(\Omega_i)) \\ &\approx \sum_i f(g(x_i)) m(g(x_i) + Jg(x_i) (\Omega_i - x_i)) \\ &= \sum_i f(g(x_i)) m(Jg(x_i) \Omega_i) \\ &\approx \sum_i f(g(x_i)) |\det Jg(x_i)| m(\Omega_i) \\ &\approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{align}

We have discovered that \begin{equation} \int_{g(\Omega)} f(u) \, du \approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{equation} By using even tinier subsets $\Theta_i$, the approximation would be even better -- so we see by a limiting argument that we actually have equality.

At a key step in the above argument, we used the approximation \begin{equation} g(x) \approx g(x_i) + Jg(x_i)(x - x_i) \end{equation} which is a good approximation when $x$ is close to $x_i$

$\endgroup$
10
  • 3
    $\begingroup$ Can you comment on what makes the rigorous proof more difficult? $\endgroup$
    – Miheer
    Commented Jul 25, 2017 at 20:32
  • 3
    $\begingroup$ If you can't explain it to a six-year-old, you don't understand it yourself” ALBERT EINSTEIN $\endgroup$ Commented Oct 30, 2018 at 6:59
  • $\begingroup$ This is lovely. Thank you. $\endgroup$
    – ashman
    Commented Nov 5, 2020 at 3:18
  • $\begingroup$ Cleaner this becomes if one takes $f=1_E$, i.e. the characteristic function of a measurable set. Note that if you have the C.of.V. formula for characteristic function then you get it for all nonnegative and measurable (or integrable) functions $f$, via approximation by step functions. $\endgroup$ Commented Dec 31, 2020 at 14:40
  • 1
    $\begingroup$ @user3180 If you think of the mapping $x \mapsto V^Tx$ as changing basis to a different orthonormal basis (the basis consisting of the columns of $V$), then I think it seems intuitive or plausible that applying this mapping should not change of measure of a region. For example, when computing the area of a region in the plane, it should not matter which orthonormal coordinate system you use. $\endgroup$
    – littleO
    Commented Jul 5, 2021 at 5:19
5
$\begingroup$

A lengthy proof of the change of variables formula for Riemann integrals in $\mathbb R^n$ (that does not use measure theory) is given in Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach by Hubbard and Hubbard. A discussion of the intuition behind it is given on page 493.

$\endgroup$
4
$\begingroup$

The answers here are good but I am tempted to add a part which I think is quite important and something which others haven't talked about from what I see: Namely, why are we allowed to use, in the linearized limit, parallelograms (and hence Jacobian determinants) to approximate areas in the first place.

In fact, whenever you have a general coordinate transformation $(u,v) \to (x(u,v),y(u,v))$ of the plane, you find that you are forced to sum over quadrilaterals instead of parallelograms in general. One can try this by partitioning the uv plane into discrete values (i.e, $(u_i,v_j)$ where $i,j$ run from, say, $1$ to $n$) and seeing the corresponding images $(x(u_i,v_j),y(u_i,v_j))$, connecting these images together by straight lines forces you to sum over quadrilaterals instead of parallelograms (you can try this for yourselves for polar coordinates explicitly!), however one can show that the area of a quadrilateral differs from that of a parallelogram by second order, and hence won't matter in the limit when the partition goes to zero. This leads us directly to the Jacobian determinant and the exterior algebra of differential forms.

$\endgroup$
3
$\begingroup$

Let there be some vector function $f(x) = x'$, which can be interpreted as remapping points or changing coordinates. For example, $f(x) = \sqrt{x \cdot x} e_1 + \arctan \frac{x^2}{x^1} e_2$ remaps the cartesian coordinates $x^1, x^2$ to polar coordinates on the basis vectors $e_1, e_2$.

Now, let $c(\tau)$ be a path parameterized by the scalar parameter $\tau$. Let $f(c) = c'(\tau)$ be the image of this path under the transformation. The chain rule tells us that

$$\frac{dc'}{d\tau} = \Big(\frac{dc}{d\tau} \cdot \nabla \Big) f$$

Define $a \cdot \nabla f \equiv \underline f(a)$ as the Jacobian operator acting on a vector $a$, and the equation can be rewritten as

$$\frac{dc}{d\tau} = \underline f^{-1} \Big(\frac{dc'}{d\tau} \Big)$$

(Note that the primes have switched, so we use the inverse Jacobian.)

This is all we need to show that a line integral in the original coordinates is related to a line integral in the new coordinates by using the Jacobian. For some scalar field $\phi$, if $\phi(x) = \phi'(x')$, then

$$\int_c \phi \, d\ell = \int_{c'} \phi' \, \underline f^{-1}(d\ell')$$

because $d\ell'$ can be converted to $\frac{d\ell'}{d\tau} \, d\tau$.

Edit: didn't see the word intuitive. As far as intuitive explanations go, you can think of a coordinate transformation like so. Imagine the lines of a polar coordinate system being warped and stretched so that they become rectangular instead. This makes working with them easier, but because the shapes of coordinate lines, paths, and areas have changed (and because you don't want them to change the result, since changing coordinates should not change the result), the naive errors introduced must be corrected for with a factor of the Jacobian operator.

$\endgroup$
1
  • $\begingroup$ "Let there be some vector function f(x)=x′, interpreted as remapping points or changing coordinates" can you elaborate on this. Clearly x and x ' are vectors, but when you say changing coordinates are you fixing the point and changing the coordinate axes, (or fixing the axes and moving the points). And what do you mean by " remaps" and the basis vectors e1,e2. A picture might be illustrative here. $\endgroup$
    – john
    Commented Jan 8, 2020 at 10:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .