I think it's clear that: \begin{equation} \frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \mathbf{A}, \quad \text{where $\mathbf{A}$ is a matrix and $\mathbf{x}$ is a vector.} \end{equation} What I'm wondering about is whether the following way to prove it is correct, specifically, if I can use the operator $d/d\mathbf{x}$ in an element wise manner. (I know that $\mathbf{A}$ does not need to be a square matrix but I believe that it doesn't change a lot.)
My proof:
Let $\mathbf{A}$ be a $n \times n$ matrix and $\mathbf{x}$ a $n \times 1$ column vector. Then, $\mathbf{A}$ can be rewritten in the following way:
\begin{equation}
\mathbf{A} = \begin{pmatrix} a_{11} & \ldots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{n1} & \ldots & a_{nn} \end{pmatrix} = \begin{pmatrix} \mathbf{a}_1^T \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T \end{pmatrix},
\end{equation}
where $\mathbf{a}_i$ is the column vector that has as its entries the elements of the $i-th$ row of the $\mathbf{A}$ matrix, that is :
\begin{equation}
\mathbf{a}_i = \left[a_{i1},\; \ldots, \; a_{in} \right]^T, \quad \text{for $i = 1, \ldots, n$}
\end{equation}
Therefore, we can write that:
\begin{equation} \tag{1}
\mathbf{A}\mathbf{x} = \begin{pmatrix} a_{11} & \ldots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{n1} & \ldots & a_{nn} \end{pmatrix}\mathbf{x} = \begin{pmatrix} \mathbf{a}_1^T\mathbf{x}\\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T\mathbf{x} \end{pmatrix}=\begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix}
\end{equation}
Using the $d/d\mathbf{x}$ operator on the above (1) equation, we have that:
\begin{equation}\label{2}
\implies
\frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \frac{d}{d\mathbf{x}} \cdot \begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix} \tag{2}
\end{equation}
We also know that given two column vectors of the same dimensions, $\mathbf{a}$ and $\mathbf{x}$, the following is true:
\begin{equation}
\frac{d\left(\mathbf{a}^T \mathbf{x} \right)}{d\mathbf{x}} = \frac{d\left(\mathbf{x}^T \mathbf{a} \right)}{d\mathbf{x}} = \mathbf{a} \tag{3}
\end{equation}
The above equation (2), becomes, (given that I'm allowed to use the derivative operator in a element wise manner) :
\begin{equation}
\overset{(2)}{\implies}
\frac{d\left(\mathbf{A}\mathbf{x} \right)}{d\mathbf{x}} = \frac{d}{d\mathbf{x}} \cdot \begin{pmatrix} \mathbf{x}^T\mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{x}^T\mathbf{a}_n \end{pmatrix} = \begin{pmatrix} \left( \frac{d}{d\mathbf{x}} \mathbf{x}^T\mathbf{a}_1 \right)^T\\ \hdashline \vdots \\ \hdashline \left( \frac{d}{d\mathbf{x}}\mathbf{x}^T\mathbf{a}_n \right)^T \end{pmatrix} \overset{(3)}{=} \begin{pmatrix} \mathbf{a}_1^T \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n^T \end{pmatrix} = \mathbf{A}
\end{equation}
Notice how I resorted to taking the transpose of the derivative $\frac{d}{d\mathbf{x}} \mathbf{x}^T\mathbf{a}_i$, this is the part that confuses me. If I hadn't taken the aforementioned tranpose of the derivative then my result would be :
\begin{equation}
\begin{pmatrix} \mathbf{a}_1 \\ \hdashline \vdots \\ \hdashline \mathbf{a}_n \end{pmatrix}
, \quad \text{which is NOT a matrix, but rather a $(n^2 \times 1)$ vector, due to $\mathbf{a}_i$ being a column vector }.
\end{equation}
What I'm wondering is whether the result vector of the equation (3) is a column vector or a row vector. If it's a row vector, then I do not need to take the transpose of the derivative. Obviously I must be somewhere wrong.