On the derivative of $\mathbf{A}\mathbf{x}$ w.r.t the vector $\mathbf{x}$, where $\mathbf{A}$ is a matrix, but without using entry-wise derivatives.

Question

I think it's clear that: \begin{equation} \frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \mathbf{A}, \quad \text{where $\mathbf{A}$ is a matrix and $\mathbf{x}$ is a vector.} \end{equation} What I'm wondering about is whether the following way to prove it is correct, specifically, if I can use the operator $d/d\mathbf{x}$ in an element wise manner. (I know that $\mathbf{A}$ does not need to be a square matrix but I believe that it doesn't change a lot.)

My proof:
Let $\mathbf{A}$ be a $n \times n$ matrix and $\mathbf{x}$ a $n \times 1$ column vector. Then, $\mathbf{A}$ can be rewritten in the following way: \begin{equation} \mathbf{A} = \begin{pmatrix} a_{11} & \ldots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{n1} & \ldots & a_{nn} \end{pmatrix} = \begin{pmatrix} \mathbf{a}_1^T \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T \end{pmatrix}, \end{equation} where $\mathbf{a}_i$ is the column vector that has as its entries the elements of the $i-th$ row of the $\mathbf{A}$ matrix, that is : \begin{equation} \mathbf{a}_i = \left[a_{i1},\; \ldots, \; a_{in} \right]^T, \quad \text{for $i = 1, \ldots, n$} \end{equation} Therefore, we can write that: \begin{equation} \tag{1} \mathbf{A}\mathbf{x} = \begin{pmatrix} a_{11} & \ldots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{n1} & \ldots & a_{nn} \end{pmatrix}\mathbf{x} = \begin{pmatrix} \mathbf{a}_1^T\mathbf{x}\\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T\mathbf{x} \end{pmatrix}=\begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix} \end{equation} Using the $d/d\mathbf{x}$ operator on the above (1) equation, we have that: \begin{equation}\label{2} \implies \frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \frac{d}{d\mathbf{x}} \cdot \begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix} \tag{2} \end{equation} We also know that given two column vectors of the same dimensions, $\mathbf{a}$ and $\mathbf{x}$, the following is true: \begin{equation} \frac{d\left(\mathbf{a}^T \mathbf{x} \right)}{d\mathbf{x}} = \frac{d\left(\mathbf{x}^T \mathbf{a} \right)}{d\mathbf{x}} = \mathbf{a} \tag{3} \end{equation} The above equation (2), becomes, (given that I'm allowed to use the derivative operator in a element wise manner) : \begin{equation} \overset{(2)}{\implies} \frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \frac{d}{d\mathbf{x}} \cdot \begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix} = \begin{pmatrix} \left( \frac{d}{d\mathbf{x}} \mathbf{x}^T\mathbf{a}_1 \right)^T\\ \hdashline \vdots \\ \hdashline \left( \frac{d}{d\mathbf{x}}\mathbf{x}^T\mathbf{a}_n \right)^T \end{pmatrix} \overset{(3)}{=} \begin{pmatrix} \mathbf{a}_1^T \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T \end{pmatrix} = \mathbf{A} \end{equation} Notice how I resorted to taking the transpose of the derivative $\frac{d}{d\mathbf{x}} \mathbf{x}^T\mathbf{a}_i$, this is the part that confuses me. If I hadn't taken the aforementioned tranpose of the derivative then my result would be : \begin{equation} \begin{pmatrix} \mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n \end{pmatrix} , \quad \text{which is NOT a matrix, but rather a $(n^2 \times 1)$ vector, due to $\mathbf{a}_i$ being a column vector }. \end{equation}

What I'm wondering is whether the result vector of the equation (3) is a column vector or a row vector. If it's a row vector, then I do not need to take the transpose of the derivative. Obviously I must be somewhere wrong.

The RHS of (3) should be $a^T$, not $a.$ Your "$\frac{dL}{dx}$" is usually written $DL(x)$ or $dL(x)$ or $DL_x$ or $dL_x$ and when $L$ is a linear map on $\Bbb R^n,$ $DL_x=L.$ Otherly said, when $L(x)=a^Tx,$ the Jacobian (row) matrix of $DL$ at $x$ is $a^T.$ — Anne Bauval, Commented Jan 28, 2023 at 14:05
Thanks for clearing that up, the thing is, in Bishop's book, Pattern Recognition and Machine Learning, p. 697, equation (C.19), he writes that \begin{equation} \frac{\partial}{\partial \mathbf{x}}\left(\mathbf{x}^{\mathrm{T}} \mathbf{a}\right)=\frac{\partial}{\partial \mathbf{x}}\left(\mathbf{a}^{\mathrm{T}} \mathbf{x}\right)=\mathbf{a} \end{equation}, you can check that for yourself. This is why I were confused. EDIT: giving it more thought, it makes more sense since I've heard about the convention that when you take the derivative w.r.t to a column vector, the result is a row vector. — Nyquist-er, Commented Jan 28, 2023 at 14:19
I looked at the book and I understand your confusion. I think he is just wrong. The result is indeed a row vector, hence $a^T.$ — Anne Bauval, Commented Jan 28, 2023 at 14:29

Nyquist-er · Accepted Answer · 2023-02-04 13:53:38Z

After a lot of searching I found the part where I was wrong. I used the denominator layout for the one derivative, that is: \begin{equation} \frac{d\left(\mathbf{a}^T \mathbf{x}\right)}{d \mathbf{x}}=\frac{d\left(\mathbf{x}^T \mathbf{a}\right)}{d \mathbf{x}}=\mathbf{a} \end{equation} and the numerator layout for the other one, that is: \begin{equation} \frac{d(\mathbf{A} \mathbf{x})}{d \mathbf{x}}=\mathbf{A} \end{equation} When one uses a denominator (numerator) layout it means that the output matrix has a number of rows equal to the size of the denominator (numerator). From my understanding, it's important to consistently use the same layout. It's also important that $\mathbf{A}$ is a $m \times n$ matrix so I can show the difference between the two. If we were to use consistently the denominator layout we would have that: \begin{equation} \tag{1} \frac{d(\overbrace{\mathbf{A} \mathbf{x})}^{\in \mathbb{R}^m}}{d \underbrace{(\mathbf{x})}_{\in \mathbb{R}^n}} = \begin{bmatrix} \frac{d\left(\mathbf{Ax} \right)_1}{d\mathbf{x}} & \ldots & \frac{d\left(\mathbf{Ax} \right)_m}{d\mathbf{x}} \end{bmatrix}_{n \times m}, \end{equation}
where as you can see the numbers of Rows of the above matrix is equal to the rows of its denominator (i.e. $\mathbf{x}$). From my question above we also have that: \begin{equation} \mathbf{A} \mathbf{x}=\left(\begin{array}{ccc} a_{11} & \ldots & a_{1 n} \\ \vdots & \ddots & \vdots \\ a_{m 1} & \ldots & a_{m n} \end{array}\right) \mathbf{x}=\left(\begin{array}{c} \mathbf{a}_1^T \mathbf{x} \\ \hdashline \vdots \\ \hdashline \mathbf{a}_m^T \mathbf{x} \end{array}\right) \end{equation} It's fairly obvious then that the i-th element of the $\mathbf{Ax}$ vector is: \begin{equation} (\mathbf{Ax})_i = \mathbf{a}^T_i \mathbf{x}, \quad \text{for $i = 1, \ldots, m$} \end{equation} Therefore \begin{equation} \frac{d(\mathbf{Ax})_i}{d\mathbf{x}} = \frac{\overbrace{\mathbf{a}^T_i \mathbf{x}}^{\in \mathbb{R}^{1\times 1}}}{\underbrace{d\mathbf{x}}_{\in \mathbb{R}^n}} = \mathbf{a}_i \in \mathbb{R}^{n \times 1}, \quad \text{again, assuming denominator layout.} \end{equation} Hence, \begin{equation} \overset{(1)}{ \implies} \frac{d(\overbrace{\mathbf{A} \mathbf{x})}^{\in \mathbb{R}^m}}{d \underbrace{(\mathbf{x})}_{\in \mathbb{R}^n}} = \begin{bmatrix} \frac{d\left(\mathbf{Ax} \right)_1}{d\mathbf{x}} & \ldots & \frac{d\left(\mathbf{Ax} \right)_m}{d\mathbf{x}} \end{bmatrix}_{n \times m} = \begin{bmatrix} \mathbf{a}_1 & \ldots & \mathbf{a}_m \end{bmatrix} = \left(\begin{array}{ccc} a_{11} & \ldots & a_{m 1} \\ \vdots & \ddots & \vdots \\ a_{1 n} & \ldots & a_{m n} \end{array}\right) = \mathbf{A}^T \end{equation}
If we where to consistently use the numerator layout we'd get that $\frac{d(\mathbf{A} \mathbf{x})}{d \mathbf{x}}=\mathbf{A}$ due to $\frac{d\left(\mathbf{a}^T \mathbf{x}\right)}{d \mathbf{x}} = \mathbf{a}^T$.

@AnneBauval. Maybe you'll find this interesting. So yeah Bishop is not wrong he just uses the denominator layout throughout his book. — Nyquist-er, Commented Feb 4, 2023 at 13:58

Stack Exchange Network

On the derivative of $\mathbf{A}\mathbf{x}$ w.r.t the vector $\mathbf{x}$, where $\mathbf{A}$ is a matrix, but without using entry-wise derivatives.

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linear-algebra
multivariable-calculus
derivatives
partial-derivative
.

Hot Network Questions

On the derivative of $\mathbf{A}\mathbf{x}$ w.r.t the vector $\mathbf{x}$, where $\mathbf{A}$ is a matrix, but without using entry-wise derivatives.

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linear-algebramultivariable-calculusderivativespartial-derivative.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linear-algebra
multivariable-calculus
derivatives
partial-derivative
.