In addition to all of the above, you can think of a matrix as jamming together a bunch of responses: take some transformation $\mathcal T$ and these "basis vectors" $\hat e_1, \hat e_2, \dots \hat e_n$ that are columns with a $1$ in the $k^\text{th}$ spot and zeroes everywhere else, that transformation has the "response vectors" $\mathcal T(\hat e_k) = \vec t_k$ which can themselves be represented as columns with the above basis.
If we jam together these vectors horizontally we get the matrix. If that transformation is linear, meaning $\mathcal T(\vec a + \vec b) = \mathcal T(\vec a) + \mathcal T(\vec b)$ and $\mathcal T(k~\vec a) = k \mathcal T(\vec a),$ then matrix multiplication describes the entire transformation because $\mathcal T(\vec u) = \mathcal T(\sum_k u_k ~\hat e_k) = \sum_k u_k ~\vec v_k,$ you just combine the columns weighted by the respective components.
Just as an example before I go further, the matrix $$\begin{bmatrix}5&6&7\\1&2&7\\0&1&7\end{bmatrix}$$maps this input column $\hat e_1 = [1,0,0]^T$ to the output column $\vec v_1 = [5,1,0]^T$ and so on; you can therefore see that the matrix takes the form $[\vec v_1 ~~\vec v_2~~\vec v_3]$ with all these basic responses jammed together.
Now if you have this background, the key to an orthogonal matrix $A^{-1} = A^T$ is that the columns of the matrix are all orthogonal unit vectors which span the space. You can see this because $\mathcal T^{-1}(\vec v_k) = \hat e_k$ by definition, but $A^T~\vec v_k = \big(\vec v_k^T ~A\big)^T.$ As a matrix, $\vec v_k^T$ represents the function from vectors to dot products $(\vec v_k ~\cdot),$ while matrix multiplication implements function composition (so you feed every output-vector of the matrix on the right to the transform on the left). So equating this with $\hat e_k^T$ you get $$\hat e_k^T = \begin{bmatrix}\big(\vec v_k^T~\vec v_1\big)&\big(\vec v_k^T~\vec v_2\big)&\dots\big(\vec v_k^T~\vec v_n\big)\end{bmatrix}$$and that is how you get to this idea that $\vec v_k^T ~ \vec v_\ell$ must be $0$ if $k \ne \ell$ or else $1$ if they are equal, which makes this an orthonormal basis.
An orthogonal matrix can therefore be thought of as any "coordinate transformation" from your usual orthonormal basis $\{\hat e_i\}$ to some new orthonormal basis $\{\hat v_i\}.$ You can view other matrices as "coordinate transformations" (if they're nondegenerate square matrices), but they will in general mess with your formula for the "dot product" of vectors since that takes a different form in skewed coordinates where you have to define something called a "dual basis" and get the "contravariant" and "covariant" components of vectors: and by the time you get to that point we will be generalizing you from your familiar "dot product" to a more general "metric tensor". If you want to avoid all of these headaches, you must restrict your notion of "coordinate transformation" to precisely these orthogonal matrices.