15
$\begingroup$

Let me start with the definitions I'm used to. Let $I[\Phi^i]$ be the action for some collection of fields. A variation of the fields about the field configuration $\Phi^i_0(x)$ is a one-parameter family of field configurations $\Phi^i(\lambda,x)$ such that $\Phi^i(0,x)=\Phi^i_0(x)$ where $\lambda\in (-\epsilon,\epsilon)$. We take the map $\lambda\mapsto \Phi^i(\lambda,x)$ to be differentiable. In that case the first variation is defined by $$\delta \Phi^i(x) \equiv \dfrac{\partial}{\partial \lambda}\bigg|_{\lambda =0}\Phi^i(\lambda,x)\tag{1}.$$

Likewise the first variation of the action is defined to be $$\delta I[\Phi^i]\equiv\dfrac{d}{d\lambda}I[\Phi^i_\lambda],\quad \Phi^i_\lambda\equiv \Phi^i(\lambda,\cdot)\tag{2}.$$

Now, as I understand, the variational principle is the statement that the physical classical field configuration should be $\Phi^i$ such that $\delta I[\Phi^i]=0$ for any first variation $\delta \Phi^i$.

It so happens that most of the time $I[\Phi^i]$ is the integral over spacetime of some Lagrangian density $d$-form $\mathcal{L}[\Phi^i]$. Then if $M$ has some sort of boundary $\partial M$ it may happen that $\delta I[\Phi^i]$ has boundary terms contributing to it.

Now, in this paper the authors say that such boundary terms make the variational principle ill-defined (c.f. page 61):

As stated by Regge and Teitelboim, the action must posssess well defined functional derivatives: this must be of the form $\delta I[\phi]=\int(\text{something})\delta \phi$ with no extra boundary terms spoiling the derivative. The action must be differentiable in order for the extremum principle to make sense.

This is also alluded to in the WP page about the Gibbons-Hawking-York term in gravity:

The Einstein–Hilbert action is the basis for the most elementary variational principle from which the field equations of general relativity can be defined. However, the use of the Einstein–Hilbert action is appropriate only when the underlying spacetime manifold ${\mathcal {M}}$ is closed, i.e., a manifold which is both compact and without boundary. In the event that the manifold has a boundary $\partial\mathcal{M}$, the action should be supplemented by a boundary term so that the variational principle is well-defined.

The boundary term alluded to above is introduced exactly to cancel one boundary term appearing when one varies the Einstein-Hilbert action. So again I take this as saying that if the variation of the EH action had such boundary term the variational principle wouldn't be well-defined.

Now, although this seems such a basic thing I must confess I still didn't get it:

  1. Regarding the discussion in the linked paper, by repeated application of the Liebnitz rule, the variation of the Lagrangian density $\cal L$ always may be written as $${\delta \cal L} = E_i\delta \Phi^i +d\Theta\tag{3},$$ where $E_i$ are the equations of motion and $\Theta$ is the presympletic potential. The action thus is of the form $$\delta I[\Phi^i]=\int_M E_i \delta \Phi^i + \int_{\partial M}\Theta\tag{4},$$ I don't see how the presence of $\Theta$ stops us from defining $E_i$ as the functional derivatives.

    Moreover, for me the most reasonable notion of differentiability for the action is to say that $\lambda\mapsto I[\Phi^i_\lambda]$ is a differentiable mapping. I don't see how boundary terms affect this.

    So why boundary terms in $\delta I[\Phi^i]$ yields ill-defined functional derivatives? And in what sense this makes $I$ not differentiable?

  2. More importantly, both the paper and the WP page on the GHY term allude to the variational principle being ill-defined if $\delta I[\Phi^i]$ contains boundary terms. We have a mapping $\lambda\mapsto I[\Phi^i_\lambda]$ and we seek an extremum of such map. I don't see how the fact that $\delta I[\Phi^i]$ has boundary terms would make this optimization problem ill-defined.

    So why boundary terms make the variational principle ill-defined? In other words, why a well-defined variational principle demands $\delta I[\Phi^i]$ to be of the form $\delta I[\Phi^i]=\int({\text{something}})\delta \Phi^i$ as the authors of the paper seem to claim?

$\endgroup$
0

3 Answers 3

7
$\begingroup$

If we have non-vanishing boundary terms, then the map $\lambda \mapsto I[\Phi_\lambda^i]$ is not differentiable in the following sense. Using somewhat less sophisticated notation, let

$$I[\Phi^i_\lambda:\eta] := \int_{\mathcal M} \mathcal L\left(\Phi^i_0(x)+\lambda\cdot \eta(x),\partial\Phi_0^i(x)+\lambda\cdot\partial\eta(x)\right) d^4x$$

for some arbitrary differentiable function $\eta$. This map is certainly differentiable, and we find that $$\left.\frac{d}{d\lambda}I[\Phi^i_\lambda:\eta]\right|_{\lambda=0} = \int_{\mathcal M}\left(\frac{\partial \mathcal L}{\partial \Phi_0^i}-\partial_\mu \left[\frac{\partial \mathcal L}{\partial(\partial_\mu \Phi_0^i)}\right]\right)\cdot \eta(x) \ d^4x+ \oint_{\partial\mathcal M} n_\mu\frac{\partial \mathcal L}{\partial (\partial_\mu \Phi_0^i)}\eta(x) \ dS$$

where $n_\mu$ are the components of the surface normal vector. This is differentiability in the sense of Gateaux. However, this Gateaux derivative generically depends on which $\eta$ we choose.

The ultimate goal is to demand that the variation in the action functional vanish regardless of our choice of $\eta$. Assuming that the boundary term vanishes, this implies that

$$\int_{\mathcal M}E[\Phi_0^i]\eta(x) d^4x = 0 \implies E[\Phi_0^i] = 0$$

However, in the presence of the boundary terms, no such implication is possible. For any particular field configuration, the variation in the action integral becomes

$$\left.\frac{d}{d\lambda}I[\Phi^i_\lambda:\eta]\right|_{\lambda=0} = \int_{\mathcal M} f(x) \eta(x) d^4x + \oint_{\partial \mathcal M} n_\mu g^\mu(x)\eta(x) dS$$

For this to vanish for arbitrary $\eta$, either both integrals need to vanish or they need to cancel each other. In the former case, the boundary terms are not present after all, while the latter case doesn't actually work. To see this, imagine that

$$\int_{\mathcal M} f(x) \eta(x) d^4x =- \oint_{\partial \mathcal M} n_\mu g^\mu(x)\eta(x) dS = C \neq 0$$

for some choice of $\eta$, and note that we can always add to $\eta$ a smooth function which vanishes on the boundary but has support at any region of the bulk that we choose. This would change the first integral but not the second, thus breaking the equality. Consequently, though the two integrals may cancel for some choices of $\eta$, they cannot possibly cancel for all choices of $\eta$ (again, unless they both vanish in the first place).

Even worse in a certain sense, the presence of the non-vanishing boundary terms implies, for reasons which follow immediately from those above, that the variation can be made to take any value in $\mathbb R$ by appropriate scaling of $\eta$.

One can think of this as rather analogous to multivariable calculus. The existence of partial (Gateaux) derivatives of some function (the action functional) along any particular direction (for arbitrary choice of $\eta$) is not sufficient to guarantee that the map is differentiable. In this case, with an eye toward our ultimate goal of having a vanishing functional derivative which independent of $\eta$, we define a functional as differentiable if its Frechet derivative can be put in the form

$$\left.\frac{d}{d\lambda}I[\Phi^i_\lambda:\eta]\right|_{\lambda=0} = \int_{\mathcal M} E[\Phi_0^i] \ \eta(x) d^4x$$

and define its functional derivative to be $E[\Phi_0^i]$.


I'd like to make a quick note on your statement

I don't see how the presence of $\Theta$ stops us from defining $E_i$ as the functional derivatives.

There's a good bit of truth in what you say. Indeed, if all you want is the Euler-Lagrange equations for the field, then you could argue that the correct formal prescription is to vary the action, throw away any boundary terms, and then demand that the variation vanish. It seems a bit inelegant, but it would give you the equations you're looking for.

One runs into problems, however, when one moves to the Hamiltonian framework. Ambiguity in boundary terms leads to ambiguity when trying to define e.g. notions of total energy of a particular spacetime. In the absence of surface terms, the Hamiltonian vanishes for $g_{ij}, \pi^{ij}$ which obey the equations of motion; choosing a boundary term amounts to choosing a value for the integral of the Hamiltonian over all of spacetime, and the GHY term yields the ADM energy.

Such boundary terms are apparently also quite important for quantum gravity, but this is an area with which I am wholly unfamiliar, so I cannot possibly comment intelligently on it.


Let me ask something, you say "However, in the presence of the boundary terms, no such implication is possible". If we demand $\delta I[\Phi_0^i]=0$ wrt any variation, then in particular this would hold for compactly supported $\eta(x)$. This would not imply $$\int_{\mathcal M}E[\Phi_0^i] \eta(x) d^4x = 0$$ for all compactly supported $\eta(x)$ and in turn imply $E[\Phi_0^i]=0$ even in the presence of boundary terms? What goes wrong here?

It sounds like you are weakening the requirement that the action be stationary under arbitrary variation to the requirement that the action only be stationary under variations with compact support. If you do this, then you get the implication (and therefore the EL equations) back. However, this means that you are shrinking the space of "candidate" field configurations to those which are identical to the initial one at the boundary.

If you are not interested in any kind of time evolution at the boundary, then this is fine; in general, this is too restrictive. One could imagine, for instance, a combination of initial condition and evolution equations which would necessarily change the field at the boundary. Imposing fixed (Dirichlet) boundary conditions in addition to the evolution equations and this particular initial condition would lead to no solutions at all.

To make matters worse, in the particular case of gravity, the Lagrangian density actually contains second derivatives of the metric by way of a total derivative

$$\partial_\mu (h^{\mu\nu} \partial_\nu \Phi_0^i)$$ which is a possibility I did not consider in the work I did above. In this case it follows that the boundary term becomes

$$ \oint_{\partial M} n_\mu \big[g^\mu(x) \eta(x) + h^{\mu \nu}(x)\partial_\nu \eta(x)\big] dS$$

In this case, it would not suffice to hold the variation fixed at the boundary - we would also need to hold its derivatives fixed as well. This is unacceptable, as the equations of motion are themselves second-order; fixing both $\Phi_0^i$ and $\partial_\nu \Phi_0^i$ at the boundary would generically overdetermine the system, except in those serendipitous cases in which $n_\mu h^{\mu\nu} \rightarrow 0$.

$\endgroup$
2
  • $\begingroup$ Thanks for the great answer ! Let me ask something, you say "However, in the presence of the boundary terms, no such implication is possible". If we demand $\delta I [\Phi_0^i]=0$ wrt any variation, then in particular this would hold for compactly supported $\eta(x)$. This would not imply $$\int_M E[\Phi_0^i]\eta(x)d^4x = 0,$$ for all compactly supported $\eta(x)$ and in turn imply $E[\Phi_0^i] = 0$ even in the presence of boundary terms? What goes wrong here? $\endgroup$
    – Gold
    Commented Apr 8, 2020 at 13:55
  • $\begingroup$ @user1620696 I've edited my answer to address your question. $\endgroup$
    – J. Murray
    Commented Apr 8, 2020 at 15:58
2
$\begingroup$

I do not completely agree with either answer, so here's another one. It appears that OP's questions essentially boil down to two reasonably self-contained questions:

Question 1: What is the definition of the functional derivative of an action and do boundary terms and conditions affect this definition?

Question 2: What makes a variational principle well-posed or ill-defined and how do boundary terms affect that?


I. On the functional derivative: I don't like the notion of the "functional derivative" because the way it appears in most physics texts and publications, it is not a rigorously defined mathematical object or operator. Let's make a difference then between the functional derivative and the Euler-Lagrange operator (EL operator).

Suppose that $X$ is a smooth $n$-manifold, $\pi:Y\rightarrow X$ is a smooth fibered manifold over $X$ whose (possibly local) sections are the fields which appear in the variational problem, and let $L:J^\infty(\pi)\rightarrow \Lambda^nX$ be a Lagrangian $n$-form. In this answer I want to avoid using jet spaces as much as possible, so the definition of a Lagrangian $n$-form will be that for each local section $\phi\in\Gamma_\pi(U)$ of $\pi$ over $U\subseteq X$ it associates a smooth $n$-form $L[\phi]\in\Omega^n(U)$ over $U$ and has the property that there is a nonnegative integer $r\in\mathbb N$ (called the order of $L$) such that if two sections $\phi,\psi$ both defined near $x$ has the same derivatives up to and including order $r$ at $x$ (derivatives are taken with respect to any fibered chart of the fibration $\pi$), then $L[\phi]_x=L[\psi]_x$. Working in a fibered chart this then allows us to write a familiar $$ L[\phi]_x=\mathcal L(x,\phi(x),\phi_{(1)}(x),\dots,\phi_{(r)}(x))dx^1\wedge\dots\wedge dx^n $$form for the Lagrangian $n$-form. Exterior derivatives are then total exterior derivatives, i.e. they differentiate through the functional dependencies of the field $\phi$ and its derivatives.

The first variation formula for the Lagrangian is then $$ \delta L[\phi,\delta\phi]=E(L)[\phi]\cdot\delta\phi+d\Theta[\phi,\delta\phi], $$where in general $E(L)[\phi]$ is order $2r$ in $\phi$ and algebraic in $\delta\phi$ (signified by the "dot notation") while $\Theta$ is order $2r-1$ in $\phi$ and order $r-1$ and linear in $\delta\phi$.

This formula is also true globally, but if $r>2$ and $n>1$ then the globally existing $\Theta$ $n-1$-form is not solely constructed out of the coeffients of the Lagrangian, it needs some additional data, like a partition of unity or a connection. Needless to say, it is non-unique. The zeroth order operator $\delta\phi\mapsto E(L)\cdot\delta\phi$ is however globally defined and unique. We call $L\mapsto E(L)$ the EL operator.

Note that no boundary conditions are needed here whatsoever. A different way of looking at things is to define $$ E(L)=[\delta L]=\delta L\mod \text{exact terms}. $$ It turns out that each class $[\delta L]$ has a unique representative which is algebraic, rather than differential, in the field variation $\delta\phi$. This canonical representative of this class is precisely $$E(L)\cdot\delta\phi=\sum_{k=0}^r(-1)^k d_{\mu_1}\dots d_{\mu_k}\frac{\partial\mathcal L}{\partial \phi^i_{,\mu_1...\mu_k}}\delta\phi^i d^nx.$$

So, this is essentially consistent with what OP wrote and has also been alluded to by J.Murray's answer. There is nothing wrong with defining the EL operator this way and in fact this is how it is done (at least in spirit) in eg. the theory of the variational bicomplex.

By constrast, for a suitable notion of functional derivative $\mathfrak d$ we would want the following:

  1. The space $\mathcal F:=\Gamma_\pi(X)$ of smooth sections can be equipped with some generalized differentiable structure.
  2. Functionals (i.e. basically functions on $\mathcal F$) of the form $$S[\phi]=\int_XL[\phi]$$ where $L$ is a smooth order $r$ Lagrangian, are smooth.
  3. The functional derivative $\mathfrak d$ is some sort of well-defined differential operator on smooth functions on $\mathcal F$ which in a sense reproduce the EL operator, eg. $\mathfrak dS=0 \Leftrightarrow E(L)=0$, whenever $S$ is an "action-type" functional.
  4. The functional derivative can in principle be applied to functionals more general than action functionals, i.e. those which are "less local".

This can be done and I will do it in the Appendix to the end of this answer. However it is somewhat subtle. To illustrate some of the subtleties, the space $\Gamma_\pi(X)$ might be empty, i.e. the fibration $\pi$ might not have global sections. The previous "formal" approach which gave us the definition of the EL operator is a local formulation that essentially operates with sheaves (well, actually, jets) of sections, so if the set of global sections is empty, we can always restrict more. It is less apparent how to take into account this locality in a purely functional formulation, since the function space should be fixed once and for all. Furthermore, even if global sections exist, the integral $S[\phi]=\int_X L[\phi]$ might fail to converge, although as a "formal integral" (cf. formal power series) it still carries valid information.

Since the formal approach operates with no integrals, this is not a problem.

The point is that OP is essentially correct with

I don't see how the presence of $\Theta$ stops us from defining $E_i$ as the functional derivatives.

although according to my tastes, I'd replace the term "functional derivative" with "EL operator" here.

Nonetheless, at the end of this answer a rigorous definition of the functional derivative will be given which hopefully illustrates how boundary conditions relate to the definition.

II. On the well-posedness of variational principles: In this section I will deal with the variational formulation of ordinary differential equations (ODE) only. The reason for that is that unlike ODEs, where the Picard-Lindelöf existence and uniqueness theorem provides a very general set of criteria for the well-posedness of differential equations, PDE systems have no analogous theorems, at least not any whose generality is comparable.

So let's consider the following data:

  1. A closed and compact interval $I=[t_0,t_1]$.
  2. A smooth $n$-manifold $Q$ (configuration space). I want to work with coordinates so suppose that $Q\subseteq\mathbb R^n$ is an open set. The generalization to the case when $Q$ is a more general manifold is immediate.
  3. A smooth Lagrangian function$$L[q](t)=L(t,q(t),q_{(1)}(t),\dots,q_{(r)}(t))$$of order $r$.
  4. The function space $\mathcal F=C^\infty(I,Q)$ of smooth functions from $I$ to $Q$.

We need a general first variation formula for the Lagrangian. It is $$\delta L=E_i\delta q^i+\frac{d}{dt}\left(\sum_{k=0}^{r-1}P^{(k+1)}_i\delta q^i_{(k)}\right),$$ where of course $f_{(k)}=d^k f/dt^k$, and the canonical momenta are $$ P^{(k)}_i=\sum_{l=0}^{r-k}(-1)^l\frac{d^l}{dt^l}\frac{\partial L}{\partial q^i_{(k+l)}},\quad 1\le k\le r . $$ Note that continuing formally to $k=0$ we have $P^{(0)}_i=E_i$.

II. A. Boundary conditions:

Associated with a variational problem are two kinds of boundary conditions, imposed boundary conditions, and natural boundary conditions. These are just the two extremes, in practice, one can use a mixture of the two.

Let $$ \mathcal B=\left\{a^i_0, a^i_1,\dots,a^i_{r-1},b^i_0,b^i_1,\dots,b^i_{r-1}\right\} $$be a set of $2nr$ numbers, which we call boundary conditions. Having imposed boundary conditions means that we consider only those (smooth) trajectories $q:I\rightarrow Q$ which satisfy $$ q^i_{(k)}(t_0)=a^i_k,\quad q^i_{(k)}=b^i_k,\quad 0\le k\le r-1. $$ Let $$ \mathcal F_{\mathcal B}=\{q\in \mathcal F:\ q \text{ satisfies the BCs }\mathcal B\}. $$

Thus, if the boundary conditions $\mathcal B$ are imposed, we consider the variational principle in the reduced function space $\mathcal F_{\mathcal B}$. Since we vary in this class, the variations of the trajectories are smooth and satisfy $$ \delta q^i_{(k)}(t_0)=\delta q^i_{(k)}(t_1)=0,\quad 0\le k\le r-1. $$

It then follows that the first variation of the action is $$ \delta S=\int_{t_0}^{t_1}E_i\delta q^i\,dt, $$as the boundary terms vanish due to the boundary conditions. Hence the stationarity condition $\delta S=0$ leads to the differential equation $E_i[q]=0$.

To obtain natural boundary conditions, we instead consider the full space $\mathcal F$ as the arena for the variational problem. The first variation of the action becomes $$ \delta S=\int_{t_0}^{t_1}E_i\delta q^i\,dt+\left.\sum_{k=0}^{r-1}P^{(k+1)}_i\delta q^i_{(k)}\right|_{t_0}^{t_1}, $$with nonzero boundary terms. If $\delta S=0$ is to be valid on some trajectory $q$ for any variation $\delta q$, it must also be true for those variations that eg. satisfy the boundary condition $\mathcal B$ or have support strictly within $I$, hence the EL equation $E_i[q]=0$ must still apply. However putting this back into the first variation formula for the action gives a pure boundary term: $$ \delta S[q]=\sum_{k=0}^{r-1}P^{(k+1)}_i[q](t_1)\delta q^i_{(k)}(t_1)-\sum_{k=0}^{r-1}P^{(k+1)}_i[q](t_0)\delta q^i_{(k)}(t_0). $$

As the variations and their derivatives may in principle take any possible value at the endpoints, we obtain that all coefficients must vanish separately from one another, hence $$ P^{(k)}_i(t_1)=P^{(k)}_i(t_0)=q,\quad 1\le k\le r. $$In other words, the canonical momenta have to vanish at the endpoints. This is then again a set of $2nr$ boundary conditions on the functions $q^i(t)$, and since they appeared dynamically, we call them "natural".

II. B. Well-posed variational principles:

The common definition for a well-posed variational problem is the following: The variational principle $\delta S[q]=0$ is well-posed if - given the pertinent boundary conditions (imposed or natural) - there is one and only one extremal of the action.

If the Lagrangian $L$ is order $r$, there are roughly three sufficient conditions for the variational principle to be well-posed. I don't dare claiming they are also necessary, since I guess even if some are violated some freak accidents can happen, but for most intents and purposes, these conditions are also necessary:

  1. The Lagrangian $L$ must be regular, i.e. $$\det\left(\frac{\partial^2 L}{\partial q^i_{(r)}\partial q^j_{(r)}}\right)\neq 0 $$.
  2. The EL equations $E_i[q]=0$ are order $2r$ (actually, 1. implies this, but not vice versa).
  3. No "unfortunate choices of endpoint data" are made.

Point 3. is the most mysterious here, but it will be elaborated later. The EL equations have the form $$ E_i[q]=(-1)^r\frac{\partial^2 L}{\partial q^i_{(r)}\partial q^j_{(r)}}q^j_{(2r)}+\text{Lower order terms}, $$thus if condition 1. is met and the matrix $W_{ij}=\frac{\partial^2 L}{\partial q^i_{(r)}\partial q^j_{(r)}}$ is invertible, the multiplying by the inverse we get $$ q^i_{(2r)}=f^i(t,q,q_{(1)},\dots,q_{(2r-1)}) $$for the EL equation, which is in standard form, and the Picard-Lindelöf (PL) theorem applies. We know that - once the initial time $t_0$ is fixed, this equation has a unique solution provided the initial positions, velocities, accelerations, ..., $2r-1$-derivatives $q^i(t_0),q^i_{(1)}(t_0),\dots,q^i_{(2r-1)}(t_0)$ are specified. This is $2nr$ initial data.

But we have seen that the number of boundary conditions (imposed or natural) is also $2nr$, so from a purely "numerological" perspective, the boundary conditions just contain enough data to uniquely specify a solution of the EL equation.

However, the variational principle wants boundary conditions, and the PL theorem wants initial data. "Most of the time", there is a bijective map between the two, but for "bad" choices of endpoints, this correspondance might break down. A typical example is the harmonic oscillator $$ \ddot q+k^2q=0, $$whose general solution is $$ q(t)=c_1\cos(kt)+c_2\sin(kt), $$where $c_1$ and $c_2$ can be straightforwardly related to initial conditions at eg. $t=0$. Consider however the boundary value problem $q(0)=a,\ q(T)=b$ for some final time $T$. The relationship is $$ a=c_1,\quad b=c_1\cos(kT)+c_2\sin(kT), $$which is unsolvable for $c_2$ in terms of $a$ and $b$ when $T=n\pi/k$ for $n\in\mathbb Z$. So for example if we choose the interval $I=[0,\pi/k]$ for the domain of the dynamics, then the variational principle for the harmonic oscillator becomes ill-defined even though conditions 1. (and thus 2.) are satisfied.

On the other hand, if conditions 1., 2., and 3. are all satisfied, then 1) the EL equation is in standard form and thus the PL theorem applies, 2) the boundary conditions (imposed or natural) produce exactly $2nr$ pieces of data for the differential equation, 3) these data can be mapped bijectively to initial data, hence there is a unique extremal for the variational problem and thus the variational principle is well-posed.

II. C. Ok, but what does this have to do with boundary terms?

Begin with an example: The Lagrangian $$ L=-\frac{1}{2}q\ddot q-\frac{1}{2}k^2q^2. $$ This is a second-order Lagrangian for the harmonic oscillator. It can be obtained from the usual one by adding a total time derivative. The variation is $$ \delta L=-\left(\ddot q+k^2q\right)\delta q+\frac{d}{dt}\left(\frac{1}{2}\dot q\delta q-\frac{1}{2}q\delta\dot q\right). $$Assuming an interval $I=[0,T]$ and for simplicity $T\neq n\pi/k$, the imposed boundary conditions are $$ q(0)=a,\quad q(T)=b \\ \dot q(0)=a^\prime,\quad \dot q(T)=b^\prime, $$where $a,a^\prime,b,b^\prime$ are independent data. This is four pieces of data for a second order equation, so by specifying them appropriately, one can create an unsolvable system. Since now there are boundary conditions for which there are no extremals, this variational principle is no longer well-posed.

It should be remarked however that the "usual" Lagrangian $L^\prime=\frac{1}{2}\dot q^2-\frac{1}{2}k^2q^2$ does give a well-posed variational problem if $T\neq n\pi/k$. Furthermore the boundary value problem given by the natural boundary conditions on $L$ is actually solvable, although it gives the trivial zero solution. However one could easily cook up a Lagrangian for the harmonic oscillator where even the natural boundary conditions are bad.

More generally, if $L$ is an order $r$ Lagrangian with canonical momenta $P_i^{(k)}$ ($1\le k\le r$), and the Lagrangian is changed as $$ L^\prime=L+\frac{df}{dt}, $$where $f$ is an order $r-1$ function (hence $L$ and $L^\prime$ have the same order), then the number of boundary conditions don't change, but the canonical momenta change as $$ P^{\prime (k)}_i=P^{(k)}_i+\frac{\partial f}{\partial q^i_{(k-1)}}. $$It might have been noticed that the imposed boundary conditions are flexible (can be set arbitrarily), but the natural boundary conditions are not, i.e. they are always that the canonical momenta must vanish at the endpoints. The ability to tune the natural boundary conditions then appears in the equivalence transformation $L^\prime=L+df/dt$, which then allows to set the natural boundary conditions arbitrarily.

However the transformation $L^\prime=L+df/dt$ also preserves the EL equations if say $L$ is order $r$ but $f$ is order $s-1$ with $s>r$. Then now $L^\prime$ is a Lagrangian of order $s>r$ and comes with $2ns>2nr$ boundary conditions, but the EL equations are still order $2r$ and thus require $2nr$ pieces of data. The additional boundary conditions obtained this way are in essence arbitrary and they overdetermine the boundary value problem. Therefore if the order $s$ of the Lagrangian from which an order $2r$ equation is derived is such that $2s>2r$, then this variational principle is necessarily ill-defined since there are boundary conditions (including the natural ones) to which no extremal corresponds.

Finally OP's question

So why boundary terms make the variational principle ill-defined? In other words, why a well-defined variational principle demands $\delta I[\phi]$ to be of the form $\delta I[\phi]=\int(\text{something})\delta\phi$ as the authors of the paper seem to claim?

can be answered: It is strictly speaking not necessary for the variation to be of the form $\delta S[\phi]=\int(\dots)\cdot\delta\phi$ for the variational principle to be well-defined, since whatever boundary terms remain simply become natural boundary conditions. However if the natural boundary conditions are themselves inappropriate (for example, there are too many of them), then this can result in an ill-defined variational principle.

II. D. Gauge symmetries, PDE systems and all that jazz:

I don't want to go into this in a very detailed manner, but for systems which do not satisfy $\det(W_{ij})\neq 0$ or PDE systems, the above analysis is much more complicated.

The singularity condition $\det(W_{ij})=0$ signals the presence of gauge symmetries, i.e. the general solution of the system contains arbitrary functions of time, thus the variational principle and any possible initial or boundary value problem is ill-defined since any given solution may always be gauge transformed into a new solution that preserves the initial or boundar value problem. To handle these cases, some sort of reduction scheme needs to be used (gauge fixing, the Dirac-Bergman process, symplectic reduction, BV/BRST etc.) to essentially reformulate the problem without gauge symmetries.

For PDE systems, the closest analogue of the PL theorem is the Cauchy-Kovalevskaya theorem, but that only works for evolutionary systems with analytic coefficients. So the above analysis is often also applied to field theories by analogy, but for rigorous results, a case-by-case analysis is necessary.

Appendix. A rigorous model for functional derivatives:

We use the formulation of diffeological spaces (a good source for that is the book by Patrick Iglesias-Zemmour). I state here only the basics. For $p\in\mathbb N$ a $p$-domain is an open subset $U$ of $\mathbb R^p$. A domain is then a $p$-domain for some $p$. Given a set $Z$ a $p$-parametrization of $Z$ is a set map $\varphi:U\rightarrow Z$, where $U$ is a $p$-domain, and a parametrization of $Z$ is a $p$-parametrization for some $p$.

A diffeology on $Z$ is a collection $\mathscr D$ of parametrizations, called plots satisfying the following axioms:

  1. Covering: Every constant parametrization is a plot.
  2. Locality: If $\varphi:U\rightarrow Z$ is a parametrization such that in some neighborhood of each $r\in U$ the restriction to that neighborhood is a plot, then $\varphi$ is a plot.
  3. Smooth compatibility: If $\varphi:U\rightarrow Z$ is a plot and $f:V\rightarrow U$ is a smooth map ($V$ is also a domain) then $\varphi\circ f$ is also a plot.

Then the pair $(Z,\mathscr D)$ is a diffeological space but will shorten it to $Z$ if the diffeology is clear from context. Given diffeological spaces $Z,W$ a map $\phi:Z\rightarrow W$ is smooth if for any plot $\varphi:U\rightarrow Z$, the map $\phi\circ\varphi:V\rightarrow W$ is also a plot. Diffeological spaces are cool because the category $\mathsf{Diff}$ whose objects are diffeological space and whose morphisms are smooth maps is basically closed under every set or categorical operation under the sun (sums, products, quotients, mapping spaces/exponentials, limits, colimits etc.).

A differential $k$-form on $Z$ is a rule which to each plot $\varphi:U\rightarrow Z$ it associates an ordinary smooth $k$-form $\omega[\varphi]\in\Omega^k(U)$ on the plot's domain such that for any smooth map $f:V\rightarrow U$ ($V$ is also a domain) $$ \omega[\varphi\circ f]=f^\ast\omega[\varphi]. $$ Then the exterior product, exterior derivative and pullback of differential forms is defined naturally, all having the usual properties (by having them commute with evaluations on plots).

We need a couple of more things about diffeological spaces:

  • The D-topology on $Z$ is the finest topology that makes all plots continuous.

  • A diffeological space is connected (w.r.t the D-topology) if and only if it is smoothly path-connected, i.e. any two points can be connected by a smooth curve.

  • If $\omega\in\Omega^k(Z)$ is a differential $k$-form, then $\omega=0$ if and only if $\omega[\varphi]=0$ for any $k$-plot $\varphi$ (in other words, differential forms are uniquely determined by the $k$-plots).


Rather than working with a general fibered manifolds, let's consider a simplified model which makes some things more transparent. Let $X\subseteq\mathbb R^n$ be an $n$ dimensional compact submanifold with boundary of $\mathbb R^n$ and let $Y\subseteq\mathbb R^m$ be a convex open subset.

Consider the function space $$\mathcal F=C^\infty(X,Y)$$ of smooth maps from $X$ to $Y$. We diffeologize $\mathcal F$ in a multitude of ways.

  • The standard functional diffeology is defined as follows. Given $U\subseteq\mathbb R^p$ a $p$-domain a map $\varphi:U\rightarrow \mathcal F$ is a plot if and only if the joint map $\varphi:U\times X\rightarrow Y,\ (s,x)\mapsto\varphi(s)(x)$ is smooth.
  • Fix a number $r\in\mathbb N\cup\{\infty,\omega\}$. A parametrization $\varphi:U\rightarrow\mathcal F$ is a plot of the variational diffeology of order $r$ if 1) $\varphi$ is a plot of the standard functional diffeology, 2) the joint function $\varphi(s)(x)$ is such that (for $0\le r<\infty$) for each $0\le k\le r$ the derivatives $$\frac{\partial^k\varphi^i}{\partial x^{\mu_1}...\partial x^{\mu_k}}(s)(x) $$are constant functions of $s\in U$ when $x\in \partial X$ is a boundary point. This definition also works for $r=\infty$ in the sense that all partial derivatives should satisfy it.
  • For $r=\omega$ instead the definition is that the boundary $\partial X$ has some neighborhood $N\subseteq X$ such that $\varphi(s)(x)$ is a constant function of $s$ when $x\in N$.

On $\mathbb N\cup\{\infty,\omega\}$ set up the ordering $n<\infty<\omega$ for any $n\in\mathbb N$.

As before, a function $S:\mathcal F\rightarrow\mathbb R$ is action-type if there is a finite order Lagrangian $L[\phi]$ on $X$ associated with $\mathcal F$ such that $$ S[\phi]=\int_X L[\phi]=\int_X dx\,\mathcal L(x,\phi(x),\dots,\phi_{(r)}(x)), $$where here $dx=d^nx$. This integral converges because $X$ is compact. If $L$ is order $r$, then the action-type function $S$ is also said to be order $r$. Then:

  1. an action-type function $S$ specified by a smooth Lagrangian is smooth with respect to the standard functional diffeology and all of the variational diffeologies on $\mathcal F$;
  2. if $S$ is an order $r$ action-type functional, and $\mathcal F$ is equipped with the variational diffeology of order $k\ge r-1$, then its exterior derivative may be identified with the usual functional derivative.

We specifically verify this last point on an order $r$ action-type function. To distinguish the exterior derivative on $\mathcal F$ and on any domain $U$ from differentials on $X$, we use $\mathfrak d$ for the former. The exterior derivative $\mathfrak dS$ is a well-defined $1$-form on $\mathcal F$ so it is sufficient to evaluate it on an $1$-plot. On any $1$-plot $s\mapsto\varphi(s)=\phi_s$ we have $$ \mathfrak dS[\phi_s]=\frac{\partial S[\phi_s]}{\partial s}\mathfrak ds=\left[\int_X dx\,\sum_{k=0}^r\frac{\partial \mathcal L}{\partial\phi^i_{,\mu_1...\mu_k}}[\phi_s]\partial_{\mu_1}\dots\partial_{\mu_k}\frac{\partial\phi^i_s}{\partial s}\right]\mathfrak ds \\ =\left[\int_X dx\,\sum_{k=0}^{r}(-1)^k d_{\mu_1}...d_{\mu_k}\frac{\partial\mathcal L}{\partial \phi^i_{,\mu_1...\mu_k}}[\phi_s]\frac{\partial\phi^i_s}{\partial s}+\int_{\partial X}(dx)_\mu\sum_{k=0}^{r-1}P^{\mu\mu_1...\mu_k}_i[\phi_s]\partial_{\mu_1}\dots\partial_{\mu_k}\frac{\partial\phi_s^i}{\partial s}\right]\mathfrak ds. $$ We may write $$\partial_{\mu_1}\dots\partial_{\mu_k}\frac{\partial\phi^i_s}{\partial s}=\frac{\partial}{\partial s}\left(\partial_{\mu_1}\dots\partial_{\mu_k}\phi^i_s\right) $$and by the definition of the variational diffeology (assuming the order of the diffeology is at least $r-1$), these functions vanish on $\partial X$, hence the boundary terms are zero and we get $$ \mathfrak dS[\phi_s]=\left[\int_X dx\,E_i[\phi_s]\frac{\partial\phi^i_s}{\partial s}\right]\mathfrak ds, $$which indeed contains the same information as the functional/EL derivative in the usual sense.

So if we equip $\mathcal F$ with the variational diffeology of order $\infty$ or $\omega$, then all action-type functionals are differentiable and the exterior derivative is basically the functional derivative. The space $\mathcal F$ still contains all smooth functions from $X$ to $Y$, so no boundary conditions had to be imposed and the functional derivative is nonetheless well-defined and has the classical form.

However, boundary restrictions are still encoded into the space through its diffeology. Under the assumptions made on $Y$ eg. that it is convex (which is not, strictly speaking, necessary, but simplifies the proof), we find the following facts about the connectivity of the space $\mathcal F$.

First, suppose that $\mathcal F$ is equipped with the variational $r$-diffeology ($r\in\mathbb N\cup\{\infty,\omega\}$). Two fields $\phi,\psi\in\mathcal F$ are $b$-equivalent if for each $x\in\partial X$ it is true that $$ \partial_{\mu_1}\dots\partial_{\mu_k}\phi^i(x)=\partial_{\mu_1}\dots\partial_{\mu_k}\psi^i(x),\quad 0\le k\le r, $$and for the $\omega$-diffeology instead the condition of $b$-equivalency is that there is a neighborhood of the boundary $\partial X$ on which they agree. Let $b\phi$ denote the equivalence class to which $\phi$ belongs to under $b$-equivalence.

Then:

  1. If $\mathcal F$ is equipped with the standard functional diffeology, the space $\mathcal F$ is connected.
  2. If $\mathcal F$ is equipped with the variational $r$-diffeology ($r\in\mathbb N\cup\{\infty,\omega\}$) then $\mathcal F$ is disconnected and the connected components of $\mathcal F$ are in bijection with the set of all $b$-equivalence classes $b\phi$.

[I might give a proof of these later, but it's straightforward and I'm tired]

The net effect of this is that essentially $$ \mathcal F=\bigcup_{b\phi}\mathcal F_{b\phi} $$is a union of connected components such that each member $\mathcal F_{b\phi}$ is a set of fields which do satisfy appropriate prescribed boundary conditions (which may be different as compared to the usual imposed/natural BCs, in particular the boundary condition corresponding to the $\omega$-diffeology is rather different).

It also has some effect on the exactness properties of the differential $\mathfrak d$. For example it is true for diffeological spaces as well that if a smooth function is closed (i.e. $\mathfrak df=0$) then it is locally constant, viz. constant on each connected component separately.

It is known from the calculus of variations that on an action-type function $S$, $\mathfrak dS=0$ does not mean that $S$ is constant, rather than (assuming that $X$ and $Y$ are contractible) that its integrand (Lagrangian) is a total divergence, hence the values of $S$ are determined by the values the field and a number of its derivatives take on the boundary. The same result is obtained from the above analysis qualitatively, since if $\mathfrak d S=0$ then $S$ must be constant on each connected component $\mathfrak F_{b\phi}$ separately and (assume the $\infty$ or $\omega$ diffeologies) as the spaces $\mathcal F_{b\phi}$ are specified by the boundary values of the fields in this component, this shows that $S[\phi]$ (for $\mathfrak d S=0$) factors through the class $b\phi$ as expected.


To take one thing away from this Appendix is that functional derivatives in the calculus of variations can be made well-defined independently of any set of boundary conditions imposed on the fields or any surface terms that appear in the action, however under a rigorous framework, boundary conditions do appear in an essential way in the definition of smoothness itself and affect the topological properties of the function space.

$\endgroup$
1
$\begingroup$

Here is one comment. If we adapt OP's definition $$\delta I[\Phi^i]=\int_M E_i \delta \Phi^i + \int_{\partial M}\Theta_i \delta \Phi^i\tag{4},$$
then in order for the bulk-term $E_i$ and the boundary-term $\Theta_i$ to be uniquely defined, we must for starters impose that they are not differential operators of non-zero order (acting on $\delta \Phi^i$) but just functions (i.e. differential operators of zero order), because else we could use tricks a la integration by parts to redistribute what belongs to the bulk and what belongs to the boundary. It turns out for the EH action on a manifold with a boundary, that this is not possible without the GHY boundary term (because of higher spacetime derivatives in the EH action).

$\endgroup$