2
$\begingroup$

So the derivation in my textbook for the covariant derivative of a vector field $\vec{u}$ in curvilinear coordinates $\xi^k$ is the following: $$\frac{\partial \vec{u}}{\partial\xi^j}=\frac{\partial (u^i\vec{a}_i)}{\partial\xi^j}=\frac{\partial u^i}{\partial\xi^j}\vec{a}_i+\frac{\partial \vec{a}_i}{\partial\xi^j}u^i=\vec{a}_i\left(\frac{\partial u^i}{\partial\xi^j}+u^k\Gamma_{jk}^i\right)$$ So the covariant derivative is defined as: $$\nabla_ku^i=\left(\frac{\partial u^i}{\partial\xi^k}+u^j\Gamma_{jk}^i\right)$$ And the same way is derived for covector field: $$\nabla_ku_i=\left(\frac{\partial u_i}{\partial\xi^k}-u_j\Gamma_{ik}^j\right)$$ And then out of nowhere it says that for higher rank tensors it must be defined as: $$\nabla_k\Phi^{i_1 i_2 ...i_p}_{j_1j_2...j_q}=\frac{\partial \Phi^{i_1 i_2 ...i_p}_{j_1j_2...j_q}}{\partial\xi^k}+\Phi^{s i_2 ...i_p}_{j_1j_2...j_q}\Gamma_{sk}^{i_1}+...+\Phi^{i_1 i_2 ...s}_{j_1j_2...j_q}\Gamma_{sk}^{i_p}-\Phi^{i_1 i_2 ...i_p}_{sj_2...j_q}\Gamma_{j_1k}^{s}-...-\Phi^{i_1 i_2 ...i_p}_{j_1j_2...s}\Gamma_{j_qk}^{s}$$ And I just can't understand what makes us think that it must be the way the higher rank tensor field covariant derivatives should look like without any intuition of proof that follows from the first two examples. Can you please help me understand it?

$\endgroup$

2 Answers 2

6
$\begingroup$

Let's look first at the case where $p=1$ and $q=2$, for concreteness. Then $$\nabla_k\Phi^i_{j_1j_2} = \partial_k\Phi^i_{j_1j_2} + \varGamma^i_{sk}\Phi^s_{j_1j_2} - \varGamma^s_{j_1k} \Phi^i_{sj_2} - \varGamma^s_{j_2k}\Phi^i_{j_1s}.$$

How to unpack what is going on here? Each index $i,j_1,j_2$ in $\Phi^i_{j_1j_2}$ will provide a correction term, but whether this correction term will come with a $+$ or $-$ sign will depend on whether the index is covariant or contravariant. Based on the first two examples, we know that the correction term coming from $i$ will be $+\varGamma^i_{sk}\Phi^s_{j_1j_2}$, while the ones coming from $j_1$ and $j_2$ are $- \varGamma^s_{j_1k} \Phi^i_{sj_2}$ and $- \varGamma^s_{j_2k}\Phi^i_{j_1s}$.

In general, it is somewhat of an elimination process: when you're computing $\nabla_k\Phi^{i_1\ldots i_p}_{j_1\ldots j_q}$ and considering the correction term corresponding to say, $i_1$, the summation index $s$ replaces $i_1$ in $\Phi^{i_1\ldots i_p}_{j_1\ldots j_q}$, so you have $\Phi^{s\ldots i_p}_{j_1\ldots j_q}$, and that gets multiplied by a $\varGamma$ term. What is this $\varGamma$ term? It should have $k$, $s$, and whatever index $s$ was substituting in the $\Phi$-term (in this case, $i_1$). Einstein's index balance leaves no choice but $\varGamma_{sk}^{i_1}$, and this comes with a $+$ sign because $i_1$ is a contravariant index.

Similarly if we're looking at the correction term coming from any covariant index, let's say $j_q$. Then $s$ substitutes $j_q$ and we have $\Phi^{i_1\ldots i_p}_{j_1\ldots s}$, and the $\varGamma$ term multiplying it must present $k$, $s$, and $j_q$. Now Einstein's index balance forces $\varGamma_{kj_q}^{s}$, and this comes with a $-$ sign because $j_q$ is a covariant index.

(If $\nabla$ is not a symmetric connection, there is a difference between $\varGamma_{kj_q}^s$ and $\varGamma_{j_qk}^s$; having the $k$ in the position $\varGamma_{k\bullet}^{\bullet}$ being the correct placement.)


As a response to OP's comment, let's discuss what happens in a coordinate-free manner. Let's say that $T$ is a tensor field of type $(p,q)$, so that if $\omega^1,\ldots,\omega^p$ are $1$-forms and $X_1,\ldots, X_q$ are vector fields, $T(\omega^1,\ldots, \omega^p,X_1,\ldots, X_q)$ is a smooth function. Therefore, if $Z$ is another vector field, it makes sense to take the directional derivative $Z(T(\omega^1,\ldots, \omega^q,X_1,\ldots, X_q))$. If we wish to define what is the tensor field $\nabla_ZT$, we should at least require the product rule $$\begin{align} Z(T(\omega^1,\ldots, \omega^q, X_1,\ldots, X_q)) &= (\nabla_ZT)(\omega^1,\ldots, \omega^p,X_1,\ldots, X_q) \\ &\quad +\sum_{i=1}^p T(\omega^1,\ldots, \nabla_Z\omega^i,\ldots, \omega^p,X_1,\ldots, X_q) \\ &\quad + \sum_{j=1}^q T(\omega^1,\ldots, \omega^p, X_1,\ldots, \nabla_ZX_j,\ldots, X_q).\end{align}$$ This means that we have no choice but to define $(\nabla_ZT)(\omega^1,\ldots, \omega^p,X_1,\ldots, X_q)$ by subtracting from the left side above the two summations in the right side! In addition, we see from this requirement that to define the covariant derivative of a tensor field of any type, we only need to know how to define covariant derivatives of vector fields and of 1-forms. Covariant derivatives of vector fields are pretty much axiomatic, and for 1-forms $\omega$ we repeat the discussion above and set $(\nabla_Z\omega)(X) = Z(\omega(X)) - \omega(\nabla_ZX)$. The negative sign here is the one from the correction factors before.

Now fix a coordinate system, let $\omega^i = {\rm d}x^i$, $X_j = \partial/\partial x^j$, and $Z = \partial/\partial x^k$ in the displayed formula above, and see the magic happen.


Alternatively, here's another discussion in terms of parallel translations: if $p\in M$ and $v\in T_pM$, let $\gamma$ be the geodesic starting at $p$ with initial velocity $v$ (in fact, any curve with the correct initial conditions will do, but you can think of geodesics for pedagogical reasons). For each $t$, let $P_t$ denote parallel translation from $p$ to $\gamma(t)$; it is a linear isomorphism from $T_pM \to T_{\gamma(t)}M$.

Now take $\theta^1,\ldots, \theta^p \in (T_pM)^*$ and $v_1,\ldots, v_q \in T_pM$. We want to define $(\nabla_vT)_p(\theta^1,\ldots, \theta^p, v_1,\ldots, v_q)$. Use parallel translation to transfer everything to $T_{\gamma(t)}M$, that is, consider $$T_{\gamma(t)}(\theta^1 \circ P_t^{-1},\ldots, \theta^p\circ P_t^{-1}, P_t(v_1),\ldots, P_t(v_q)).$$This is a smooth function of $t$. What should be your knee-jerk reaction here? Taking a derivative: $$\begin{align} (\nabla_vT)_p&(\theta^1,\ldots, \theta^p, v_1,\ldots, v_q) = \\ =& \frac{{\rm d}}{{\rm d}t}\bigg|_{t=0}T_{\gamma(t)}(\theta^1 \circ P_t^{-1},\ldots, \theta^p\circ P_t^{-1}, P_t(v_1),\ldots, P_t(v_q)).\end{align}$$

Actually evaluating this derivative in coordinates, using the product rule, will yield several terms. Taking derivatives of terms involving $P_t^{-1}$ will product $-$ signs, since in general the derivative of the inversion mapping $A\mapsto A^{-1}$ in ${\rm GL}(n)$ is $-A^{-1}\dot{A}A^{-1}$.

$\endgroup$
3
  • $\begingroup$ I appreciate that but my question is more general. I'm asking how did we conclude that for a higher rank tensors the correction is going to look anyhting like the rank 1 version. Is there a way to derive it from the first two equations or is it just something that we postulate. becuase if it's the latter I'm not really comfortable with it especially when we call that operator a derivative which has a concrete intuitive meaning but it's really hard to visualize it when thinking of rank 2 or higher tensor fields. $\endgroup$ Commented Oct 18, 2023 at 23:13
  • $\begingroup$ @KrumKutsarov I have edited my answer to address your comment, hope it clarifies things. $\endgroup$
    – Ivo Terek
    Commented Oct 19, 2023 at 0:06
  • $\begingroup$ @KrumKutsarov I have also added a brief explanation involving parallel translations. Now you can pick your poison... $\endgroup$
    – Ivo Terek
    Commented Oct 19, 2023 at 0:19
3
$\begingroup$

Ivo's answer is very good already, but there is another aspect to this (which I first saw in John Lee's book) that I find very satisfying and enough of a motivator. Suppose you have an ordinary affine connection $\nabla$ (i.e, it takes pairs of vector fields and outputs vector fields) defined on your manifold $M$. Denote by $\mathscr{T}^k_{\ell}(M)$ the bundle of $(k, \ell)$ tensors on $M$. Then there is a unique connection (which, by a slight abuse of notation, we'll also denote by $\nabla$)$$\nabla: \Gamma(T M) \times \Gamma(\mathscr{T}^k_{\ell}(M)) \to \Gamma(T M) $$ such that

  • in $\Gamma\left(\mathscr{T}^0_{1}(M)\right) = \Gamma(T M)$, $\nabla$ coincides with the original given connection.

  • in $\Gamma\left(\mathscr{T}^0_{0}(M)\right) = \mathcal{C}^{\infty}(M)$, $\nabla$ satisfies $\nabla_X f = X(f)$

  • $\nabla$ satisfies a Leibniz rule with respect to tensorial products, i.e. $$\nabla_X\left( F \otimes G \right) = \left( \nabla_X F \right) \otimes G + F \otimes \left(\nabla_X G \right)$$

  • $\nabla$ commutes with all traces: if $\operatorname{tr}$ denotes the trace with respect ot any pair of indexes, then $$\nabla_X \left( \operatorname{tr} F \right) = \operatorname{tr} \left( \nabla_X F \right)$$

The proof involves pretty long computations and I could include it here later if you want it, but the result by itself is beautiful. All of the properties involved are very natural to ask for, and in coordinates, they produce your exact formula.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .