1
$\begingroup$

Suppose I have differentiable functions (in the sense of the frechet derivative) $f\colon \mathbb{R} \to \mathbb{R}^{n\times n} $ and $g\colon \mathbb{R}^{n} \to \mathbb{R}$ , where $f$ is a linear operator, and want to compute the (frechet) derivative of their composition, i.e. $f \circ g\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} $. Using the chain rule for normed spaces I obtain \begin{align*} D(f \circ g (\mathbf{x}))h = \underbrace{ (\mathrm{D}f)(g(\mathbf{x}))}_{\in \mathbb{R}^{n\times n} } \cdot \underbrace{ \mathrm{D}g(\mathbf{x})}_{\in \mathbb{R}^{1\times n} }h, \quad h \in \mathbb{R} .\end{align*} How can this hold since the dimensions of the product do not match up?

Edit: Consider the function \begin{align*} f(\mathbf{x}) = \mathbf{A}(\mathbf{x})\mathbf{x} .\end{align*} with \begin{align*} \mathbf{A}(\mathbf{x})\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} , \quad \mathbf{A}(\mathbf{x}) = \begin{bmatrix} \alpha(\mathbf{x}) & 1 & 0& 0 & \cdots & 0 \\ 1 & \alpha(\mathbf{x}) & 1 & 0 & \cdots & 0 \\ 0 & 1 & \alpha(\mathbf{x}) & 1 & \cdots & 0 \\ \vdots & \cdots & \ddots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & 1 & \alpha(\mathbf{x})& 1 \\ 0 & 0 & 0 & 0 & 1 & \alpha(\mathbf{x}) \end{bmatrix} \end{align*} whereby $\alpha(\mathbf{x}) = \left\|\mathbf{x}\right\|_{2} $. My professor now rewrites the above as \begin{align*} \mathbf{A}(\mathbf{x})\mathbf{x} = \mathbf{T}\mathbf{x} + \mathbf{x}\left\|\mathbf{x}\right\|_{2} , \quad \mathbf{T}:=\left[\begin{array}{cccccc} 3 & 1 & & & & \\ 1 & 3 & 1 & & & \\ & \ddots & 3 & \ddots & & \\ & & \ddots & \ddots & \ddots & \\ & & & 1 & 3 & 1 \\ & & & & 1 & 3 \end{array}\right] .\end{align*} He then finds \begin{align*} \mathrm{D}f(\mathbf{x}) \mathbf{h}=\mathbf{T h}+\|\mathbf{x}\|_{2} \mathbf{h}+\mathbf{x} \frac{\mathbf{x}^{\top} \mathbf{h}}{\|\mathbf{x}\|_{2}} =\left(\mathbf{A}(\mathbf{x})+\frac{\mathbf{x} \mathbf{x}^{\top}}{\|\mathbf{x}\|_{2}}\right) \mathbf{h} \end{align*}

from which I concluded that \begin{align*} \mathrm{D}(\mathbf{A}(\mathbf{x}))= \frac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} } .\end{align*}

$\endgroup$
5
  • 2
    $\begingroup$ $(\textrm{D}f)(g(\mathbf{x}))$ is in $\mathbb{R}^{n^2 \times 1}$ $\endgroup$
    – Jeff Cheng
    Commented Feb 7, 2022 at 20:28
  • $\begingroup$ @JeffCheng see my edit. $\endgroup$
    – Richard
    Commented Feb 7, 2022 at 20:35
  • 1
    $\begingroup$ How do you conclude that $\mathrm{D}(\mathbf{A}(\mathbf{x}))= \dfrac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$? $\endgroup$
    – Arthur
    Commented Feb 8, 2022 at 9:00
  • $\begingroup$ What is $\mathbb R^{n \times n}$ in contrast to $\mathbb R^{n, n}$ ? $\endgroup$
    – Paul Frost
    Commented Feb 8, 2022 at 17:54
  • $\begingroup$ You should write $\mathbf{A} : \mathbb{R}^{n} \to \mathbb{R}^{n, n}$. $\mathbf{A}(x)$ is an element of $ \mathbb{R}^{n, n}$. $\endgroup$
    – Paul Frost
    Commented Feb 10, 2022 at 10:54

3 Answers 3

1
$\begingroup$

For the purposes of this type of multivariate calculus (e.g. Frechet derivatives), a domain or codomain of $m\times n$ real matrices is identified with $\Bbb R^{mn}$, not with $\Bbb R^{m\times n}$. You "flatten" your matrices before you do derivatives and chain rules on them. At least if you want your derivative at a given point to be represented by a standard rectangular grid of numbers.

If you don't want to flatten your matrices before you do calculus in them, then your derivatives will be cuboids of higher dimension. You're now venturing into what I would call tensor calculus territory.

Edit: After looking at your example, here is what I think happens: $\mathbf A$ is a function $\Bbb R^n\to \Bbb R\to \Bbb R^{n\times n}$, the way you describe. But $f$ is a function $\Bbb R^n\to \Bbb R^n$, and as such its Frechet derivative may be realized as an $n\times n$ matrix. You have been given the matrix $Df = \mathbf{A}(\mathbf{x})+ \dfrac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$ as this derivative.

$\endgroup$
4
  • $\begingroup$ hm, my professor said that the resulting derivative of the above will be in $\mathbb{R}^{n\times n}$, however, with the flattened approach this would yield an element in $\mathbb{R}^{n^2 \times n}$ $\endgroup$
    – Richard
    Commented Feb 7, 2022 at 21:09
  • $\begingroup$ @Richard Well, the composition yields a map $\Bbb R\to\Bbb R^{n^2}$. The Frechet derivative at any point needs to be a linear map with the same domain and codomain. I don't see how you can get out of that. $\endgroup$
    – Arthur
    Commented Feb 7, 2022 at 21:11
  • $\begingroup$ I will edit my question with an example, it would be very kind if you could then take a look at it. $\endgroup$
    – Richard
    Commented Feb 7, 2022 at 21:12
  • $\begingroup$ ok, I quickly typed up an example from the lecture $\endgroup$
    – Richard
    Commented Feb 7, 2022 at 21:19
1
$\begingroup$

$ \def\bbR#1{{\mathbb R}^{#1}} \def\a{\alpha}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $A matrix-by-vector gradient generates a third-order tensor, so it does not fit comfortably in standard matrix notation.

However, the differential of a matrix has the shape of a matrix and obeys all of the rules of matrix algebra. Likewise, the differential of a vector obeys the familiar rules of vector algebra.

First, consider the differentials of the constituent functions. $$\eqalign{ \a^2 &= x^Tx \\ 2\a\,d\a &= 2x^Tdx &\qiq d\a = \frac{x^Tdx}{\a} \\ A &= B + I\a &\qiq dA = I\,d\a \\ }$$ With these we can differentiate the composite function using back-substitution. $$\eqalign{ f &= Ax \\ df &= A\,dx + dA\,x \\ &= A\,dx + x\,d\a \\ &= \LR{A + \frac{xx^T}{\a}}dx \\ \grad{f}{x} &= \LR{A + \frac{xx^T}{\a}} \\ }$$ If you really need the third-order tensor, its $$\eqalign{ \grad{A}{x} &= \LR{\frac{I\star x}{\a}} \\ }$$ where $\star$ denotes the dyadic (aka tensor) product.

$\endgroup$
2
  • $\begingroup$ thanks for the answer, I haven't really learned about differentials yet, are they rigorously introduced by the subject of differential forms? $\endgroup$
    – Richard
    Commented Mar 2, 2022 at 11:49
  • $\begingroup$ @Richard That might help you understand the vector case, but the differential of a matrix cannot be expressed using Differential Forms. To learn more, I would recommend the latest edition of Magnus and Neudecker's Matrix Differential Calculus, or perhaps Hjorungnes's Complex-Valued Matrix Derivatives $\endgroup$
    – greg
    Commented Mar 2, 2022 at 15:23
0
$\begingroup$

The Frechet derivative can in fact be defined for maps between arbitrary normed spaces, in particular for maps into the space of $(n \times n)$-matrices. Given $\phi : V \to W$. the Frechet derivative of $\phi$ at $x \in V$ is a linear map $D\phi(x) = D\phi \mid_x : V \to W$. The chain rule says that $$D(\psi \circ \phi)(x) = D(\psi)(\phi(x)) \circ D\phi(x). $$ Applying this linear map to $h \in V$ gives $$D(\psi \circ \phi)(x)(h) = D(\psi)(\phi(x))(D\phi(x)(h)).$$ It seems that you do not properly distiguish between linear maps and their matrix representations, and this may be confusing.

There is no dimension problem in your question: $Dg(x)$ is a linear map $\mathbb R^n \to \mathbb R$, hence $D\phi(x)(h) \in \mathbb R$. Also $Df(g(x))$ is a linear map $\mathbb R \to \mathbb R^{n \times n}$ and of course you insert $Dg(x)(h)$ as an argument. In terms of matrices: $Dg(x)$ is an $(n \times 1)$-matrix, $Df(g(x))$ is an $(1 \times n^2)$-matrix (where we identify $\mathbb R^{n \times n}$ with $\mathbb R^{n^2}$ to obtain a "standard" matrix with real entries) and its product an $n \times n^2$-matrix. You write $Df(g(x)) \in \mathbb R^{n \times n}$, but this is wrong since it suggests that it is an $(n \times n)$-matrix.

Concerning your example I do not see the connection to the chain rule. $f$ is not the composition of two functions as needed in the chain rule. But certainly your conclusion $$D(\mathbf{A}(\mathbf{x}))= \frac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$$ fails. $D \mathbf A(x)$ is a linear map $D \mathbf A(x) : \mathbb R^n \to \mathbb R^{n \times n}$, but $\frac{\mathbf{x} \mathbf{x}^{\top}}{\|\mathbf{x}\|_{2}}$ is a single $(n \times n)$-matrix.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .