Chain rule for differentiation yields conflicting dimensions

Question

Suppose I have differentiable functions (in the sense of the frechet derivative) $f\colon \mathbb{R} \to \mathbb{R}^{n\times n} $ and $g\colon \mathbb{R}^{n} \to \mathbb{R}$ , where $f$ is a linear operator, and want to compute the (frechet) derivative of their composition, i.e. $f \circ g\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} $. Using the chain rule for normed spaces I obtain \begin{align*} D(f \circ g (\mathbf{x}))h = \underbrace{ (\mathrm{D}f)(g(\mathbf{x}))}_{\in \mathbb{R}^{n\times n} } \cdot \underbrace{ \mathrm{D}g(\mathbf{x})}_{\in \mathbb{R}^{1\times n} }h, \quad h \in \mathbb{R} .\end{align*} How can this hold since the dimensions of the product do not match up?

Edit: Consider the function \begin{align*} f(\mathbf{x}) = \mathbf{A}(\mathbf{x})\mathbf{x} .\end{align*} with \begin{align*} \mathbf{A}(\mathbf{x})\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} , \quad \mathbf{A}(\mathbf{x}) = \begin{bmatrix} \alpha(\mathbf{x}) & 1 & 0& 0 & \cdots & 0 \\ 1 & \alpha(\mathbf{x}) & 1 & 0 & \cdots & 0 \\ 0 & 1 & \alpha(\mathbf{x}) & 1 & \cdots & 0 \\ \vdots & \cdots & \ddots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & 1 & \alpha(\mathbf{x})& 1 \\ 0 & 0 & 0 & 0 & 1 & \alpha(\mathbf{x}) \end{bmatrix} \end{align*} whereby $\alpha(\mathbf{x}) = \left\|\mathbf{x}\right\|_{2} $. My professor now rewrites the above as \begin{align*} \mathbf{A}(\mathbf{x})\mathbf{x} = \mathbf{T}\mathbf{x} + \mathbf{x}\left\|\mathbf{x}\right\|_{2} , \quad \mathbf{T}:=\left[\begin{array}{cccccc} 3 & 1 & & & & \\ 1 & 3 & 1 & & & \\ & \ddots & 3 & \ddots & & \\ & & \ddots & \ddots & \ddots & \\ & & & 1 & 3 & 1 \\ & & & & 1 & 3 \end{array}\right] .\end{align*} He then finds \begin{align*} \mathrm{D}f(\mathbf{x}) \mathbf{h}=\mathbf{T h}+\|\mathbf{x}\|_{2} \mathbf{h}+\mathbf{x} \frac{\mathbf{x}^{\top} \mathbf{h}}{\|\mathbf{x}\|_{2}} =\left(\mathbf{A}(\mathbf{x})+\frac{\mathbf{x} \mathbf{x}^{\top}}{\|\mathbf{x}\|_{2}}\right) \mathbf{h} \end{align*}

from which I concluded that \begin{align*} \mathrm{D}(\mathbf{A}(\mathbf{x}))= \frac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} } .\end{align*}

$(\textrm{D}f)(g(\mathbf{x}))$ is in $\mathbb{R}^{n^2 \times 1}$ — Jeff Cheng, Commented Feb 7, 2022 at 20:28
How do you conclude that $\mathrm{D}(\mathbf{A}(\mathbf{x}))= \dfrac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$? — Arthur, Commented Feb 8, 2022 at 9:00
What is $\mathbb R^{n \times n}$ in contrast to $\mathbb R^{n, n}$ ? — Paul Frost, Commented Feb 8, 2022 at 17:54
You should write $\mathbf{A} : \mathbb{R}^{n} \to \mathbb{R}^{n, n}$. $\mathbf{A}(x)$ is an element of $ \mathbb{R}^{n, n}$. — Paul Frost, Commented Feb 10, 2022 at 10:54

Arthur · Accepted Answer · 2022-02-08 09:02:15Z

1

For the purposes of this type of multivariate calculus (e.g. Frechet derivatives), a domain or codomain of $m\times n$ real matrices is identified with $\Bbb R^{mn}$, not with $\Bbb R^{m\times n}$. You "flatten" your matrices before you do derivatives and chain rules on them. At least if you want your derivative at a given point to be represented by a standard rectangular grid of numbers.

If you don't want to flatten your matrices before you do calculus in them, then your derivatives will be cuboids of higher dimension. You're now venturing into what I would call tensor calculus territory.

Edit: After looking at your example, here is what I think happens: $\mathbf A$ is a function $\Bbb R^n\to \Bbb R\to \Bbb R^{n\times n}$, the way you describe. But $f$ is a function $\Bbb R^n\to \Bbb R^n$, and as such its Frechet derivative may be realized as an $n\times n$ matrix. You have been given the matrix $Df = \mathbf{A}(\mathbf{x})+ \dfrac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$ as this derivative.

edited Feb 8, 2022 at 9:02

answered Feb 7, 2022 at 21:04

Arthur

201k14 gold badges176 silver badges317 bronze badges

$\begingroup$ hm, my professor said that the resulting derivative of the above will be in $\mathbb{R}^{n\times n}$, however, with the flattened approach this would yield an element in $\mathbb{R}^{n^2 \times n}$ $\endgroup$
– Richard
Commented Feb 7, 2022 at 21:09
$\begingroup$ @Richard Well, the composition yields a map $\Bbb R\to\Bbb R^{n^2}$. The Frechet derivative at any point needs to be a linear map with the same domain and codomain. I don't see how you can get out of that. $\endgroup$
– Arthur
Commented Feb 7, 2022 at 21:11
$\begingroup$ I will edit my question with an example, it would be very kind if you could then take a look at it. $\endgroup$
– Richard
Commented Feb 7, 2022 at 21:12
$\begingroup$ ok, I quickly typed up an example from the lecture $\endgroup$
– Richard
Commented Feb 7, 2022 at 21:19

Add a comment |

greg · Accepted Answer · 2022-03-01 20:23:39Z

$ \def\bbR#1{{\mathbb R}^{#1}} \def\a{\alpha}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $A matrix-by-vector gradient generates a third-order tensor, so it does not fit comfortably in standard matrix notation.

However, the differential of a matrix has the shape of a matrix and obeys all of the rules of matrix algebra. Likewise, the differential of a vector obeys the familiar rules of vector algebra.

First, consider the differentials of the constituent functions. $$\eqalign{ \a^2 &= x^Tx \\ 2\a\,d\a &= 2x^Tdx &\qiq d\a = \frac{x^Tdx}{\a} \\ A &= B + I\a &\qiq dA = I\,d\a \\ }$$ With these we can differentiate the composite function using back-substitution. $$\eqalign{ f &= Ax \\ df &= A\,dx + dA\,x \\ &= A\,dx + x\,d\a \\ &= \LR{A + \frac{xx^T}{\a}}dx \\ \grad{f}{x} &= \LR{A + \frac{xx^T}{\a}} \\ }$$ If you really need the third-order tensor, its $$\eqalign{ \grad{A}{x} &= \LR{\frac{I\star x}{\a}} \\ }$$ where $\star$ denotes the dyadic (aka tensor) product.

thanks for the answer, I haven't really learned about differentials yet, are they rigorously introduced by the subject of differential forms? — Richard, Commented Mar 2, 2022 at 11:49
@Richard That might help you understand the vector case, but the differential of a matrix cannot be expressed using Differential Forms. To learn more, I would recommend the latest edition of Magnus and Neudecker's Matrix Differential Calculus, or perhaps Hjorungnes's Complex-Valued Matrix Derivatives — greg, Commented Mar 2, 2022 at 15:23

Paul Frost · Accepted Answer · 2022-02-08 18:21:36Z

The Frechet derivative can in fact be defined for maps between arbitrary normed spaces, in particular for maps into the space of $(n \times n)$-matrices. Given $\phi : V \to W$. the Frechet derivative of $\phi$ at $x \in V$ is a linear map $D\phi(x) = D\phi \mid_x : V \to W$. The chain rule says that $$D(\psi \circ \phi)(x) = D(\psi)(\phi(x)) \circ D\phi(x). $$ Applying this linear map to $h \in V$ gives $$D(\psi \circ \phi)(x)(h) = D(\psi)(\phi(x))(D\phi(x)(h)).$$ It seems that you do not properly distiguish between linear maps and their matrix representations, and this may be confusing.

There is no dimension problem in your question: $Dg(x)$ is a linear map $\mathbb R^n \to \mathbb R$, hence $D\phi(x)(h) \in \mathbb R$. Also $Df(g(x))$ is a linear map $\mathbb R \to \mathbb R^{n \times n}$ and of course you insert $Dg(x)(h)$ as an argument. In terms of matrices: $Dg(x)$ is an $(n \times 1)$-matrix, $Df(g(x))$ is an $(1 \times n^2)$-matrix (where we identify $\mathbb R^{n \times n}$ with $\mathbb R^{n^2}$ to obtain a "standard" matrix with real entries) and its product an $n \times n^2$-matrix. You write $Df(g(x)) \in \mathbb R^{n \times n}$, but this is wrong since it suggests that it is an $(n \times n)$-matrix.

Concerning your example I do not see the connection to the chain rule. $f$ is not the composition of two functions as needed in the chain rule. But certainly your conclusion $$D(\mathbf{A}(\mathbf{x}))= \frac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$$ fails. $D \mathbf A(x)$ is a linear map $D \mathbf A(x) : \mathbb R^n \to \mathbb R^{n \times n}$, but $\frac{\mathbf{x} \mathbf{x}^{\top}}{\|\mathbf{x}\|_{2}}$ is a single $(n \times n)$-matrix.

Stack Exchange Network

Chain rule for differentiation yields conflicting dimensions

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
real-analysis
calculus
multivariable-calculus
derivatives
frechet-derivative
.

Linked

Hot Network Questions

Chain rule for differentiation yields conflicting dimensions

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged real-analysiscalculusmultivariable-calculusderivativesfrechet-derivative.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
real-analysis
calculus
multivariable-calculus
derivatives
frechet-derivative
.