1
$\begingroup$

I think it's clear that: \begin{equation} \frac{d(\mathbf{A} \mathbf{x})}{d \mathbf{x}}=\mathbf{A}, \quad \text{ where $\mathbf{A}$ is a matrix and $\mathbf{x}$ is a vector}. \end{equation} but if we had an vector valued function $f: \mathbb{R}^n \rightarrow \mathbb{R}^n $, what can we say about the following derivative: \begin{equation} \frac{d\left(f(\mathbf{A} \mathbf{x})\right)}{d \mathbf{x}}= \text{?} \end{equation} For scalar valued univariate functions we know that: \begin{equation} \frac{d(g(ax))}{d(x)} = \frac{d(g(ax))}{d(ax)}\frac{d(ax)}{dx} = g'(ax) a \end{equation} In other words, what is the chain rule for vector by vector derivatives? Is it something like the following? \begin{equation} \frac{d\left(f(\mathbf{A} \mathbf{x})\right)}{d \mathbf{x}}= \left(\frac{d(\mathbf{Ax})}{d\mathbf{x}}\right)^T \frac{d\left(f(\mathbf{A} \mathbf{x})\right)}{d (\mathbf{Ax})} = \mathbf{A^T} f'(\mathbf{Ax}) \end{equation}

$\endgroup$
7
  • $\begingroup$ Where did the transpose come from? In the case $f(x)=x$ you should recover your previous formula. $\endgroup$ Commented Jun 28, 2023 at 22:42
  • $\begingroup$ It's just a guess, because the scalar by vector chain rule has a transpose in it. In any way, I'm looking for an answer, I'm not saying that that's the correct answer. $\endgroup$
    – Nyquist-er
    Commented Jun 28, 2023 at 22:44
  • 1
    $\begingroup$ It’s just $f’(Ax)A$. The chain rule is the same as in single variable calculus: $f(g(x))’=f’(g(x))g’(x)$ $\endgroup$
    – Andrew
    Commented Jun 28, 2023 at 23:19
  • 1
    $\begingroup$ Any book will have it, though the one I learned from was Spivak's Calculus on Manifolds. $\endgroup$
    – Andrew
    Commented Jun 28, 2023 at 23:44
  • 2
    $\begingroup$ Note there are different layouts for matrix calculus. One gives $A^Tf'(Ax)$ and the other gives $f'(Ax)A$. Consistency is the only thing that matters, so if you say $d(Ax)/dx = A$ and not $A^T$, then @AndrewZhang's comment is correct $\endgroup$ Commented Jun 28, 2023 at 23:55

1 Answer 1

2
$\begingroup$

If you have functions $f: \mathbb{R}^{n} \longrightarrow \mathbb{R}^{m}$ and $g: \mathbb{R}^{k} \longrightarrow \mathbb{R}^{n}$, the chain rule behaves just the same as in the scalar case, as mentioned in the comments: the derivative of the function $f\circ g: \mathbb{R}^{k} \longrightarrow \mathbb{R}^{m}$ is given by $$(f \circ g)'(x) = f'\big(g(x)\big) \cdot g'(x).$$ Only now you have to take into account that the $\cdot$ denotes composition of the respective derivatives which are linear transformations: $$(f \circ g)'(x): \mathbb{R}^{k} \longrightarrow \mathbb{R}^{m}$$ $$f'\big(g(x)\big): \mathbb{R}^{n} \longrightarrow \mathbb{R}^{m}$$ $$g'(x): \mathbb{R}^{k} \longrightarrow \mathbb{R}^{n}$$ You can also fix bases and think of these derivatives as matrices, in which case $(f \circ g)'(x)$ is $m\times k$, $f'\big(g(x)\big)$ is $m\times n$, and $g'(x)$ is $n\times k$; as you can verify, such product of matrices makes sense.

Now, in your case, $n = m = k$ and $g(x) = Ax$ is a linear transformation. For linear transformations $g'(x) = g$, for any $x\in \mathbb{R}^{n}$. This states that the best linear approximation of a linear transformation, near $x$, is the linear transformation itself, which is quite intuitive. Back to the chain rule, for $g(x)=Ax$, we have $$(f \circ g)'(x) = f'(Ax) \cdot A.$$

I am not sure how you would get a transpose, but some references use different conventions and perhaps your $\mathrm{d}/\mathrm{d}x$ notation means something else (some kind of gradient?). For instance, for $m=1$ and $f: \mathbb{R}^{n} \longrightarrow \mathbb{R}$, the gradient $\nabla f(x)$ is defined as the unique vector that satisfies $$f'(x)[v] = \langle v, \nabla f(x) \rangle.$$ Here, we denote by $f'(x)[v]$ the linear functional $f'(x)$ applied to the vector $v \in \mathbb{R}^n$, which gives a number. In this case, $$(f \circ g)'(x) = f'(Ax) \cdot A$$ while $$\nabla(f \circ g)(x) = A^T \nabla f (Ax).$$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .