24
$\begingroup$

One possible Lagrangian for a point particle moving in (possibly curved) spacetime is

$$L = -m \sqrt{-g_{\mu\nu} \dot{x}^\mu \dot{x}^\nu},$$

where a dot is a derivative with respect to a parameter $\lambda$. This Lagrangian gives an action proportional to proper time, and it is reparametrization invariant ($\lambda$ need not be an affine parameter).

If we try to go to the Hamiltonian picture, we have the momenta

$$p_\mu = \frac{m}{\sqrt{-\dot{x}^2}} g_{\mu\nu} \dot{x}^\nu,$$

which obey the relation $p^2+m^2=0$. We then get that the Hamiltonian $$H = p_\mu \dot{x}^\mu - L$$ is identically zero.

I understand that this is not a problem because, since we have a constraint $\phi(x,p) = p^2 + m^2 = 0$, according to the Dirac method we should really use the Hamiltonian $H' = H + c \phi$, as explained for example in this post. But what I would like to know is, why do we get a zero Hamiltonian? I suspect that this is due to the reparametrization invariance, and the fact that we don't have a preferred notion of time. Will this always happen? Why?

$\endgroup$
1

6 Answers 6

27
$\begingroup$

...what I would like to know is why we get a zero Hamiltonian. I suspect that this is due to the reparametrization invariance... Will this always happen? Why?

Yes, it is due to reparameterization invariance. In other words, the zero-Hamiltonian result holds for any reparameterization-invariant action, not just for the relativistic particle. In this sense, the answer to "Will this always happen" is yes. And one way to answer the "Why?" question is to give a general proof. That's what I'll do here.

I'll denote the parameter as $t$ instead of $\lambda$, because it's easier to type.

Consider any model with an action of the form $$ S=\int dt\ L(t) \hskip2cm L(t) = L\big(\phi(t),\dot\phi(t)\big) \tag{1} $$ where $\phi_1(t),\phi_2(t),...$ is a collection of dynamic variables. If the action is invariant under rigid translations in $t$, then Noether's theorem gives us a corresponding conserved quantity: the Hamiltonian. If the action is invariant under reparameterizations in $t$, then we might expect to get a stronger result because of the more extreme symmetry, and we do: the conservation law still holds, but the conserved quantity is identically zero (and therefore useless). The goal is to prove that the larger symmetry leads to this stronger result.

Suppose that the action is invariant under all transformations of the form $$ \phi_n(t)\rightarrow\phi_n(t+\epsilon) \tag{2} $$ where $\epsilon(t)$ is allowed to be any smooth function for which the map $t\rightarrow t+\epsilon(t)$ is invertible. This is reparameterization invariance. For infinitesimal $\epsilon$, \begin{equation} \delta\phi_n(t) = \dot\phi_n(t)\epsilon. \tag{3} \end{equation} Take the derivative of this with respect to $t$ to get \begin{equation} \delta\dot\phi_n(t) = \frac{d}{dt}\Big(\dot\phi_n(t)\epsilon\Big). \tag{4} \end{equation} Now consider the identity \begin{equation} \delta S = \int dt\ \delta L \tag{5} \end{equation} with \begin{equation} \delta L = \sum_n\left( \frac{\partial L}{\partial \phi_n}\delta\phi_n + \frac{\partial L}{\partial \dot\phi_n}\delta\dot\phi_n \right), \tag{6} \end{equation} which is valid for any transformation of the $\phi$s. For the particular transformation (3)-(4), equations (4)-(5) become \begin{equation} \delta S = \sum_n\int dt\ \left(\frac{\partial L}{\partial \phi_n}\dot\phi_n\epsilon + \frac{\partial L}{\partial \dot\phi_n}\frac{d}{dt}(\dot\phi_n\epsilon) \right). \tag{7} \end{equation} Compare this to the identity $$ \frac{d}{dt}(L\epsilon) = \sum_n\left(\frac{\partial L}{\partial \phi_n}\dot\phi_n + \frac{\partial L}{\partial \dot\phi_n}\frac{d}{dt}\dot\phi_n\right)\epsilon + L\frac{d}{dt}\epsilon \tag{8} $$ to see that (7) may also be written \begin{equation} \delta S = \int dt\ \left(\frac{d}{dt}(L\epsilon) + \left[\sum_n\frac{\partial L}{\partial \dot\phi_n}\dot\phi_n-L\right] \frac{d}{dt}\epsilon \right). \tag{9} \end{equation} For any finite integration interval, the first term is zero if $\epsilon(t)$ is zero at the endpoints of the integration interval. Since $d\epsilon/dt$ is arbitrary within this interval, and since this holds for any interval, the invariance of the action ($\delta S=0$) implies that the quantity in square brackets must be zero. The quantity in square brackets is the Hamiltonian, so this completes the proof that the Hamiltonian is identically zero in this class of models.

$\endgroup$
11
  • 1
    $\begingroup$ Beautiful answer! Is this result generalisable to field theory with reparametrization invariance in all coordinates, not only in time? $\endgroup$
    – Nikita
    Commented Oct 1, 2020 at 23:45
  • 3
    $\begingroup$ @Nikita Thank you for the kind words! And yes, that's correct. In fact, the answer is written in a way that can be applied directly to the metric field in general relativity. Just interpret $\phi_n(t)$ as an abbreviation for $g_{ab}(t,x)$ where the indices $a$ and $b$, and the spatial coordinates $x$ are all implicit in the single "index" $n$. $\endgroup$ Commented Oct 1, 2020 at 23:52
  • 1
    $\begingroup$ @Cham (continued) The benefit of a conserved quantity is that the conservation law tells us something about the objects' behavior. But if the conserved quantity has the same value (zero in this case) for all behaviors, then it's not informative. That's the point: in a model with time reparameterization invariance, the fact that the Hamiltonian is conserved is not useful, because it has the same value for all behaviors. $\endgroup$ Commented Jun 30, 2021 at 13:32
  • 1
    $\begingroup$ This is very well said. Thanks! $\endgroup$
    – Cham
    Commented Jun 30, 2021 at 13:47
  • 1
    $\begingroup$ @Cham An alternative perspective is to think in terms of equations of motion, instead of thinking in terms of the action/lagrangian formulation. Given a set of equations of motion, we often want to know how to get new solutions from old solutions. In this case, the premise is that if $\phi_n(t)$ (with $n\in\{1,2,...,N\}$) is one solution, then $\phi_n'(t)\equiv \phi_n(t+\epsilon(t))$ is another solution of the same equations of motion with the same coefficients as before. Those coefficients may themselves be functions of $t$, but we're not changing those coefficients. $\endgroup$ Commented May 8 at 23:58
10
$\begingroup$

Here's another way:

Suppose that your lagrangian has the following property, for any $\theta$ (it could be a function of time $t$): \begin{equation}\tag{1} L(q, \, \theta \, \dot{q}) = \theta \, L(q, \, \dot{q}). \end{equation} This implies that the action \begin{equation}\tag{2} S = \int_{t_1}^{t_2}L(q, \, \dot{q}) \, dt \end{equation} is invariant under a reparametrisation of time : $t \, \Rightarrow \, \tau(t)$ which doesn't change the integration limits : $\tau(t_1) = t_1$ and $\tau(t_2) = t_2$. Then, using (1), you could write the following: \begin{equation}\tag{3} \frac{d\,}{d\theta} \, L(q, \, \theta \, \dot{q}) \Big|_{\theta = 1} = \dot{q} \, \frac{\partial L}{\partial \dot{q}} \equiv L(q, \, \dot{q}), \end{equation} which implies a vanishing hamiltonian: \begin{equation}\tag{4} H = \dot{q} \, \frac{\partial L}{\partial \dot{q}} -L = 0. \end{equation} This applies to your lagrangian for a relativistic particle, with $q \rightarrow x^{\mu}$ and $t \rightarrow \lambda$.

$\endgroup$
6
$\begingroup$
  1. Infinitesimal world-line (WL) reparametrization transformations $$ t^{\prime} - t ~=:~\delta t ~=~-\varepsilon(t), \qquad \text{(horizontal variation)}\tag{A}$$ $$ q^{\prime j}(t) - q^j(t)~=:~\delta_0 q^j(t) ~=~\varepsilon(t)\dot{q}^j(t), \qquad \text{(vertical variation)}\tag{B}$$ $$ q^{\prime j}(t^{\prime}) - q^j(t)~=:~\delta q^j(t) ~=~0. \qquad \text{(full variation)}\tag{C}$$ are gauge/local/$t$-dependent transformations, which is stickly speaking the realm of Noether's second theorem. This leads to an off-shell Noether identity (L).

  2. In contrast, Noether's first theorem in its basic formulation considers global/$t$-independent transformations. (For the related proof of on-shell energy conservation via global time translation symmetry, see e.g. my Phys.SE answer here.) However in OP's case, there is a $t$-dependent trick. Straightforward standard calculations reveal that the infinitesimal variation of the action $$S~=~\int_I\!dt~ L\tag{D}$$ is of the form $$ \delta S~=~\int_I\!dt~ (\varepsilon k + h \dot{\varepsilon}), \tag{E}$$ for some function $k$, where the energy function $$h~:=~p_j\dot{q}^j-L, \qquad p_j~:=~\frac{\partial L}{\partial \dot{q}^j}, \tag{F}$$ is the Noether charge.

  3. Case if the transformation (A)-(C) is a strict off-shell symmetry: If the infinitesimal variation (E) has no boundary contributions, we must have $$\varepsilon k + h \dot{\varepsilon}~\equiv~0\tag{G}$$ off-shell. Taking $\varepsilon$ to be $t$-independent we see that $$k~\equiv~ 0.\tag{H}$$ Comparing with eq. (G), we get OP's sought-for conclusion

    $$h~\equiv~ 0.\tag{I}$$

    In other words, the Lagrangian $L$ is a homogeneous function of the generalized velocities $\dot{q}$ of weight 1, cf. Cham's answer. We shall later see via eq. (L) that eq. (I) also implies that the Lagrangian $L$ has no explicit time dependence. In this case the action (D) is manifestly WL reparametrization invariant.

  4. Case if the transformation (A)-(C) is an off-shell quasi-symmetry: It turns out that $$k~\equiv~ \frac{\delta S}{\delta q^j}\dot{q}^j + \dot{h}, \tag{J}$$ so that the infinitesimal variation (E) is $$\delta S~=~\int_I\!dt~ (\varepsilon \frac{\delta S}{\delta q^j}\dot{q}^j + \frac{d(h\varepsilon)}{dt}).\tag{K}$$ Even if we allow possible total time derivative contributions in the infinitesimal variation (K), we still get an off-shell Noether identity

    $$0~\equiv~-\frac{\delta S}{\delta q^j}\dot{q}^j~\equiv~(\dot{p}_j-\frac{\partial L}{\partial q^j})\dot{q}^j~\equiv~\frac{d(p_j\dot{q}^j)} {dt}-(\frac{\partial L}{\partial q^j}\dot{q}^j+\frac{\partial L}{\partial \dot{q}^j}\ddot{q}^j)$$ $$~\equiv~\frac{d(p_j\dot{q}^j)} {dt}-(\frac{dL}{dt}-\frac{\partial L}{\partial t})~\equiv~\frac{dh}{dt}+\frac{\partial L}{\partial t}.\tag{L} $$

    Example 1: If $L(q,\dot{q})$ has no explicit time-dependence, then the energy (F) has no explicit time-dependence. From the off-shell identity (L) $$ 0~\equiv~\frac{dh(q,\dot{q})}{dt}~=~\frac{\partial h(q,\dot{q})}{\partial q^j}\dot{q}^j+\frac{\partial h(q,\dot{q})}{\partial \dot{q}^j}\ddot{q}^j,\tag{M}$$ we deduce that the energy $h$ must be a global constant independent of all the variables $(q,\dot{q},t)$.

    Example 2: If $L(t)$ does not depend on $q$ and $\dot{q}$, then the action $S$ has a quasi-symmetry under the transformation (A)-(C), and the energy is $h(t)=-L(t)$.

$\endgroup$
3
$\begingroup$

Edit. As Cham has already answered, homogeneity of the Lagrangian is to blame: $$L(x,\theta \, \dot{x}) = \theta\,L(x, \dot{x})$$ Whenever you have this property, then you have $$\dot{x}^T\frac{\partial L}{\partial \dot{x}}(x, \dot{x}) = L(x, \dot{x})$$ This is true not only for Lagrangians, this is true for any homogeneous function: if $f(\theta\, x) = \theta\,f(x)$ then $x^T\, \nabla f(x) = f(x)$. Even when people do convex optimization and they end up with such homogeneity, the Legendre transform is ill defined because your homogeneous function is not strictly convex. The solution is to eliminate the homogeneity by restricting to a lower dimensional subspace. The same thing happens in general relativity.

In order to truly successfully perform Legendre's transform, Legendre's map $p = \frac{\partial L}{\partial \dot{x}}(x, \dot{x})$ from the tangent bundle of 4-velocities to the cotangent bundle of generalized 4-momenta, should be bijective (invertable). That's what usually happens in classical mechanics. Since the Lagrangian in the case of general relativity is invariant under the action of the group $\mathbb{R_+}$ acting by rescaling, the Legendre transform is identically zero and the Legendre's map is not invertable, i.e. it maps the time-like tangent cone bundle (dimension 4) into the time-like momenta of constant magnitute (dimesnion 3), where the orbits of the scaling group action are the fibers that get smashed by the Legendre's map. The solution to this is issue is to eliminate the scaling group action by restricting the Lagrnagian onto the time-like unit tangent bundle. Then the Legendre's map is invertable and bijective and things start to work out fine, as explained below.

There is a non-zero Hamiltonain , it is just not constructed as naively and directly as it is in classical mechanics.

Let $\dot{x} = \cfrac{dx}{d\lambda},$ where $\lambda$ is any arbitrary parameter.

Fundamentally, in the philosophy of General Relativity, the parameter $\lambda$ is not of any importance to the theory. Only the shape of the curve $$\gamma = \{x(\lambda) \, : \, \lambda \in [\lambda_1, \lambda_2]\}$$ matters and not the specific parametrization. After all, this curve $\gamma = \{x(\lambda)\}$ is supposed to be a space-time time-like geodesic, which is a geometric property independent of any parametrization $\lambda$, so we really care about the geodesic $\gamma$ as a geometric curve and not as a parametrized curve.

I am going to use a bit of matrix notations, to skip all the indexing. So $$x = \begin{bmatrix} x^i\end{bmatrix} = \begin{bmatrix} x^0\\x^1\\x^2\\x^3\end{bmatrix} \, \text{ and } \, g(x) = \big[g_{ij}(x)\big]_{i,j = 0}^{3} \, \text{ is the 4 by 4 metric tensor} $$

Take your Lagrangian $$L = -m\,\sqrt{- \, \dot{x}^T\,g(x)\, \dot{x}}$$ and define the action $$S[\gamma] = -m\, \int_{\lambda_1}^{\lambda_2} \, \sqrt{- \, \dot{x}^T(\lambda)\,g\big(x(\lambda)\big)\, \dot{x}(\lambda)}\,d\lambda $$ and look for the critical (non-parametrized!!!) curves
$$\delta S[\gamma] = 0$$ In coordinates $[x^i]$ and with respect to a generic parametrization, the equation $\delta S[\gamma] = 0$ is equivalent to the Euler-Lagrange differential equations $$\frac{d}{d\lambda}\left(\frac{m}{\sqrt{-\, \dot{x}^T\,g(x)\, \dot{x}}} \,\, g(x)\, \dot{x}\right) \, = \, \frac{m}{2\, \sqrt{-\, \dot{x}^T\,g(x)\,\dot{x}\,}\,}\, \left(\, \dot{x}^T\,\frac{\partial g}{\partial x}(x)\,\dot{x}\, \right)$$ where $$ \dot{x}^T\, \frac{\partial g}{\partial x}(x)\, \dot{x}\, = \, \begin{bmatrix} \frac{\partial g_{ij}}{\partial x^0}(x)\,\dot{x}^i\,\dot{x}^j \\ \frac{\partial g_{ij}}{\partial x^1}(x)\,\dot{x}^i\,\dot{x}^j \\ \frac{\partial g_{ij}}{\partial x^2}(x)\,\dot{x}^i\,\dot{x}^j \\ \frac{\partial g_{ij}}{\partial x^3}(x)\,\dot{x}^i\,\dot{x}^j \end{bmatrix}$$ for short. Take a solution (time-like) $\gamma = \{ x(\lambda)\, : \, \lambda \}$ of the Euler-Lagrange equations above. As I have already emphasized, the parametrization of $\lambda$ with respect to $\lambda$ is not important for us. Therefore, I can define the function $$\tau = \tau(\lambda) = \int_{\lambda_0}^{\lambda}\, \sqrt{-\, \dot{x}(\zeta)^T \, g\big(\, x(\zeta)\,\big)\, \dot{x}(\zeta)\,}\, d\zeta$$ with derivative $$\frac{d\tau}{d\lambda} = \sqrt{-\, \dot{x}(\lambda)^T \, g\big(\, x(\lambda)\,\big)\, \dot{x}(\lambda)\,} \, > \,0$$ Thus the function $\tau = \tau(\lambda)$ is strictly increasing and therefore invertable, i.e. there is $\lambda = \lambda(\tau)$. Consequently, we can re-parametrize our solution curve $\gamma$ as $$\gamma = \{ \, x(\tau) \, : \, \tau \, \} \, \text{ where } \, x(\tau)= x\big(\lambda(\tau)\big)$$ Observe that $$\gamma = \{\,x(\tau)\, : \, \tau \,\} = \{\, x(\lambda)\, : \, \lambda \, \}$$ in other words, this is the same curve in space time, but parametrized in two different ways. Denote $x' = \frac{dx}{d\tau}$. Furthermore, $$x' = \frac{dx}{d\tau} =\frac{d\lambda}{d\tau} \frac{dx}{d\lambda} = \left( \frac{d\tau}{d\lambda}\right)^{-1} \frac{dx}{d\lambda} = \frac{1}{\sqrt{- \, \dot{x}^T \, g(x) \, \dot{x}}\,}\, \frac{dx}{d\lambda}$$ and in particular $$\frac{d}{d\tau} = \frac{1}{\sqrt{- \, \dot{x}^T \, g(x) \, \dot{x}}\,}\, \frac{d}{d\lambda} $$ Recall that the curve $\gamma$ is a critical curve for the action $S[\gamma]$, i.e. $\delta S[\gamma] = 0$. When $\gamma$ is parametrized with respect to $\lambda$, it's coordinate parametrization $\gamma = \{\, x(\lambda) \, : \, \lambda\}$ solves the Euler-Lagrange equations $$\frac{d}{d\lambda}\left(\frac{m}{\sqrt{-\, \dot{x}^T\,g(x)\, \dot{x}}} \,\, g(x)\, \dot{x}\right) \, = \, \frac{m}{2\, \sqrt{-\, \dot{x}^T\,g(x)\,\dot{x}\,}\,}\, \left(\, \dot{x}^T\,\frac{\partial g}{\partial x}(x)\,\dot{x}\, \right)$$ whose both sides I can multiply by $\frac{1}{\sqrt{-\, \dot{x}^T\, g(x) \, \dot{x}}\,}$ and obtain the equivalent equations $$\frac{1}{\sqrt{-\, \dot{x}^T\, g(x) \, \dot{x}}\,} \, \frac{d}{d\lambda}\left(\frac{m}{\sqrt{-\, \dot{x}^T\,g(x)\, \dot{x}}} \,\, g(x)\, \dot{x}\right) \, = \, \frac{m}{-\, 2\, \dot{x}^T\,g(x)\,\dot{x}\,}\, \left(\, \dot{x}^T\,\frac{\partial g}{\partial x}(x)\,\dot{x}\, \right)$$ It is easy to check that with the new parametrization $\gamma = \{\, x(\tau) \, : \, \tau\, \}$ $$\sqrt{-\, \frac{dx}{d\tau}^T \, g(x) \, \frac{dx}{d\tau}} =\sqrt{-\, (x')^T \, g(x) \, x'} = 1$$ Consequrntly, after the reparametrization $\lambda = \lambda(\tau)$ the Euler-Lagrange equations turn into the equivalent simplified equations $$\frac{d}{d\tau}\left(\, m\, g(x)\, \frac{dx}{d\tau}\right) \, = \, \frac{m}{2}\,\left( \frac{dx}{d\tau}^T\,\frac{\partial g}{\partial x}(x)\, \frac{dx}{d\tau}\,\right) $$ which $\gamma = \{\, x(\tau)\, : \, \tau\,\}$ solves.

In other words we have proven that any solution $\gamma$ to the original Euler-Lagrange equations, after the appropriate reparametrization, solves the simplified Euler-Lagrange equations. In other words, a curve $\gamma$ is a critical curve of the action $S[\gamma]$, i.e. $\delta S[\gamma] = 0$ if and only if it solves the simplified Euler-Lagrange differential equations $$\frac{d}{d\tau}\left(\,m\, g(x)\, \frac{dx}{d\tau}\right) \, = \, \frac{m}{2}\, \left(\, \frac{dx}{d\tau}^T\,\frac{\partial g}{\partial x}(x)\, \frac{dx}{d\tau}\,\right) $$ where the resulting parametrized solution $\gamma = \{\, x(\tau)\, : \, \tau\,\}$ is paremtrized with respect to proper time, i.e. $\sqrt{ - \, x'(\tau)^T\, g\big(x(\tau)\, x'(\tau)\big)} = 1$ for any $\tau$.

Now, if you set the generalized momenta $$p = m\, g(x) \frac{dx}{d\tau}$$ you get the following doubled system of differential equations \begin{align} &p = m\, g(x) \frac{dx}{d\tau}\\ &\frac{dp}{d\tau}\, = \, \frac{m}{2}\, \left(\, \frac{dx}{d\tau}^T\,\frac{\partial g}{\partial x}(x)\, \frac{dx}{d\tau}\,\right) \end{align} and when you solve the first half with respect to $\frac{dx}{d\tau}$, due to the fact that $g(x)$ is an invertable symmetric matrix, and you substitute in the second half of the equations, you obtain the system of differential equations \begin{align} &\frac{dx}{d\tau} = \frac{1}{m}\, g(x)^{-1} \,p\\ &\frac{dp}{d\tau}\, = \, \frac{1}{2m}\, \left(\, p^T\, g(x)^{-1}\,\frac{\partial g}{\partial x}(x)\,g(x)^{-1}\,p\,\right) \end{align} These are Hamiltonian equations where the Hamiltonain function is $$H(x, p) = \frac{1}{2m}\big(p^T\, g(x)^{-1}\, p\,\big)$$ Thus, we have proved that a curve $\gamma$ is a critical curve of the action $S[\gamma]$, i.e. $\delta S[\gamma] = 0$ if and only if it solves the Hamiltonian differential equations with Hamiltonain function $H(x, p) = \frac{1}{2m}\big(p^T\, g(x)^{-1}\, p\,\big)$ where $p = m\, g(x)\, \frac{dx}{d\tau}$.

$\endgroup$
1
  • $\begingroup$ Well, I mean, I understand all of this, I say as much in the question. I also know about the quadratic Hamiltonian. My question is why the "naive" Hamiltonian ends up being zero. $\endgroup$
    – Javier
    Commented Dec 19, 2018 at 15:22
2
$\begingroup$

$\let\lam=\lambda \def\dx{\dot x}$ As you've already seen the reason for hamiltonian being zero is in having a lagrangian which is a homogenous function of degree 1 in the $\dx$'s.

There is a simpler way out, resulting in a hamiltonian equal to lagrangian: to take $$L = {\textstyle{1 \over 2}}\,g_{\mu\nu}\,\dx^\mu \dx^\nu$$ (I'm using a sign convention opposite to yours). Here $\dx^\mu$ means $dx^\mu/d\lam$, with $\lam$ an arbitrary parameter with $\lam\in[0,1]$. The action is $$S = \int_0^1\!L\>d\lam.$$ The hamiltonian is the same given by @Futurologist.

This choice for $S$ is (unfortunately) called "energy" by mathematicians.

$\endgroup$
0
$\begingroup$

The answer is simple: $m$ is a Lagrange multiplier ... and (as you're about to see) Relativity is a red-herring. The same can be made true for non-relativistic theory, if it's done in just the right way.

Consider the geometry that has the following as its invariants: $$dx^2 + dy^2 + dz^2 + 2 dt du + α du^2,\quad dt + α du,$$ and the following action integral: $$\int L ds, \quad L = m\frac{|\dot{𝐫}|^2 + 2 \dot{t}\dot{u} + α\dot{u}^2}{2} - U\left(\dot{t} + α\dot{u} - 1\right),$$ where $$𝐫 = (x,y,z),\quad \dot{(\_)} = \frac{d}{ds}(\_),$$ and where $m$ and $U$ are Lagrange multipliers. This will force the conditions: $$\left|\frac{d𝐫}{ds}\right|^2 + 2\frac{dt}{ds}\frac{du}{ds} + α\left(\frac{du}{ds}\right)^2 ≃ 0,\quad \frac{dt}{ds} + α\frac{du}{ds} ≃ 1\quad⇒\quad \left(\frac{dt}{ds}\right)^2 - α\left|\frac{d𝐫}{ds}\right|^2 ≃ 1,$$ where "$≃$" denotes "on-shell" equality.

(Actually, I'm not sure if imposing constraints counts as "on-shell" or "off-shell". They're "on-shell" in the sense that they come out as Euler-Lagrange equations from varying with respect to the Lagrange multipliers.)

Defining $$𝐩 = \frac{∂L}{∂\dot{𝐫}},\quad -h = \frac{∂L}{∂\dot{t}},\quad μ = \frac{∂L}{∂\dot{u}},$$ we have $$𝐩 = m\dot{𝐫},\quad h = -m\dot{u} + U,\quad μ = m(\dot{t} + α\dot{u}) - αU,$$ and in addition: $$M ≡ μ + αh = m\dot{t}.$$ The Euler-Lagrange equations are (and imply): $$\frac{d𝐩}{ds} ≃ 𝟬,\quad \frac{dμ}{ds} ≃ 0,\quad \frac{dh}{ds} ≃ 0\quad⇒\quad \frac{dM}{ds} ≃ 0.$$ In addition, from the Lagrange multipliers we also get: $$|\dot{𝐫}|^2 + 2\dot{t}\dot{u} + α\dot{u}^2 ≃ 0,\quad 1 ≃ \dot{t} + α\dot{u}\quad⇒\quad\dot{t}^2 - α|\dot{𝐫}|^2 = 1,$$ as already noted, and also: $$μ ≃ m - αU,\quad |𝐩|^2 - 2μh - αh^2 = |𝐩|^2 - 2Mh + αh^2 ≃ -2mU + αU^2\quad⇒\quad M^2 - α|𝐩|^2 ≃ m^2.$$

The Hamiltonian is $$H = 𝐩·\dot{𝐫} - h\dot{t} + μ\dot{u} - L = m\frac{|\dot{𝐫}|^2 + 2\dot{t}\dot{u} + α\dot{u}^2}{2} - U ≃ -U.$$

Now, let's look at the different cases.

If $α = 0$, then $$\dot{𝐫} ≃ 𝐯 ≡ \frac{d𝐫}{dt},\quad \dot{t} ≃ 1,\quad \dot{u} = -\frac{|𝐯|^2}{2},$$ thus $$𝐩 = m\dot{𝐫} ≃ m𝐯,\quad h = -m\dot{u} + U ≃ m\frac{|𝐯|^2}{2} + U,\quad μ = M = m\dot{t} ≃ m.$$ Non-relativistic physics. The extra quantity $U$ is internal energy, while $h$ plays the role of energy. Both $μ$ and $M$ coincide and are the $m$, while $𝐩$ is the momentum. The invariants are $$\frac{|𝐩|^2}{2m} - h = -U,\quad m,$$ if $m ≠ 0$.

The symmetry group that leaves the following $$dx^2 + dy^2 + dz^2 + 2 dt du,\quad dt$$ invariant the central extension of the Galilei group: the Bargmann group. The invariant quantity $μ$ is the Noether charge associated with the central charge. Reducing to the Galilei group corresponds to setting $μ = 0$. The Galilei group supports only the homogeneous representations and the zero-mass, instantaneous/infinite-speed representations of the Bargmann group. In the hoomogeneous family includes the "vacuum". For all cases $U ≠ 0$ can still be held. For the vacuum, it provides a place for "vacuum energy".

If you set the internal energy to zero, then the Hamiltonian is on-shell equal to zero, as well. So, it can be done for non-relativistic theory too, although there's actually no reason to impose the condition $U = 0$.

If $α > 0$, then $$\dot{𝐫} ≃ γ𝐯,\quad \dot{t} ≃ γ ≡ \frac{1}{\sqrt{1 - α|𝐯|^2}},\quad \dot{u} ≃ \frac{1 - γ}{α} = -γw,\quad w ≡ \frac{|𝐯|^2}{1 + \sqrt{1 - α|𝐯|^2}},$$ and $$𝐩 ≃ γm𝐯 = \frac{m𝐯}{\sqrt{1 - α|𝐯|^2}},\quad h ≃ γmw + U = \frac{m}{\sqrt{1 - α|𝐯|^2}}\frac{|𝐯|^2}{1 + \sqrt{1 - α|𝐯|^2}},\\ μ ≃ m - αU,\quad M ≃ \frac{m}{\sqrt{1 - α|𝐯|^2}}.$$ We also have the identity: $$1 - αw = \sqrt{1 - α|𝐯|^2} = \frac{1}{γ}.$$

These coincide with the formulae for Special Relativity, where in vacuuo light-speed is given by $c = 1/\sqrt{α}$ ... except for the inclusion of the internal energy $U$.

The symmetry group that leaves the following invariant $$dx^2 + dy^2 + dz^2 + 2 dt du + α du^2,\quad dt + α du,$$ is not the Poincaré group, but is a deformation of the Bargmann group which comprises a one-parameter extension of the Poincaré group. The Noether charge corresponding to the central charge is still $μ$, but this no longer coincides with the mass $m$ ... unless you set $U = 0$.

The reduction from this group to the Poincaré group leaves $μ$ intact, and actually corresponds to setting $U = 0$, instead. That reduction makes the Hamiltonian $H$ for the Lagrangian $L$ originally posed on-shell equal to zero, and we've returned full circle to the original note about $m$ being a Lagrange multiplier.

$\endgroup$