First of all, the "best" definition of a determinant depends on what you are doing. If you are trying to prove the change of variables formula for Riemann integration, then the volume definition is precisely what you need. If you are doing algebraic geometry, it is useful to know that the determinant is a polynomial in the matrix entries. If you are trying to compute a determinant, it might be useful to consider Laplace expansion.
I don't know of any linear algebra books that define the determinant as a volume, and I'm not sure there are many out there. If the book is focused on the computational side of linear algebra, they may motivate the determinant this way, but the actual definition will likely be something that is easy to compute directly. If the book deals with vector spaces (as in Axler's case), it isn't even possible to define the determinant as a volume because there isn't really a notion of "volume" in a vector space. (Though you may be able to define a notion of volume by using the determinant.)
That doesn't mean interpreting the determinant as volume isn't useful. Actually, I think it can be extremely useful for developing intuition. To demonstrate, here are some properties you can quickly "prove" using the volume definition:
- $\det(AB) = \det(A)\det(B)$. Think of $AB$ as the composition of two linear transformations. If we apply $AB$ to the unit cube, $B$ transforms the cube into a parallelepiped with its volume scaled by $\det(B)$, and $A$ transforms this parallelepiped into another parallelepiped with its volume scaled by $\det(A)$. So $AB$ transforms the unit cube into a parallelepiped with volume $\det(A)\det(B)$.
- $\det(A) = 0$ iff $A$ is not invertible. Think of $A$ as the linear transformation that sends the unit basis of $\mathbb{R}^n$ to the columns of $A$. The determinant is just the volume of the parallelepiped formed by the columns of $A$. It is intuitively clear that $\det(A) = 0$ if and only if the columns of $A$ are linearly dependent.
- The determinant is multilinear and alternating. If you scale an individual column vector of $A$, then the volume of the corresponding parallelepiped scales by the same factor. If you switch two column vectors, then the "sign" of the determinant switches.
- If a matrix is diagonalizable, then the determinant is the product of its eigenvalues. Think of how the matrix transforms the parallelepiped formed by its eigenbasis.
Though these arguments are not rigorous, there is a way to make this idea of "volume" rigorous. We define the exterior power $\bigwedge^k V$ as the vector space such that any multilinear, alternating map $V^k \to W$ factors through $\bigwedge^k V$. The elements of $\bigwedge^k V$ are written as products $v_1 \wedge \dots \wedge v_k$ of vectors $v_1, \dots, v_k$. The idea is that the element $v_1 \wedge \dots \wedge v_k$ somehow represents the "volume" of the parallelepiped formed by the vectors $v_1, \dots, v_k$, taken after projecting onto some $k$-dimensional subspace of $V$. The upshot: if we take $n = \dim V$, then $\bigwedge^n V$ is the vector space consisting of products $v_1 \wedge \dots \wedge v_n$, where this product represents the volume of the associated parallelepiped. We can then define the determinant of a linear transformation $T$ as the unique scalar such that
$$
Tv_1 \wedge \dots \wedge Tv_n = (\det T)(v_1 \wedge \dots \wedge v_n)
$$
for all vectors $v_i$. See this answer for more details.
The above discussion is purely algebraic, and the only idea we have really used is that the determinant is the unique multilinear and alternating map on matrices (up to scaling). But the exterior algebra is somehow the "nicest" context in which to think about these things. For example, the multiplicativity of the determinant is pretty much immediate:
$$
ABv_1 \wedge \dots \wedge ABv_n = (\det A)(Bv_1 \wedge \dots \wedge Bv_n)
= (\det A)(\det B)(v_1 \wedge \dots \wedge v_n).
$$
All the other properties above are just as easy to prove, which isn't surprising because the elements of $\bigwedge^n V$ can be thought of as volumes. In fact, the volume operation is the unique alternating and multilinear map on matrices (up to scaling), which is why there is a purely algebraic way of encoding it.
If you haven't seen exterior powers before, the previous two paragraphs may not have made much sense. I haven't read it, but I believe a good book that takes this approach is Linear Algebra via Exterior Products by Sergei Winitzki.