You are currently browsing the category archive for the ‘update’ category.

The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers {U^{s+1}[N]}-norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.

Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree {s} filtration

\displaystyle  G = G_0 \geq G_1 \geq \dots \geq G_s \geq G_{s+1} = \{1\}

of a group {G}, which is a nested sequence of subgroups that obey the relation {[G_i,G_j] \leq G_{i+j}} for all {i,j}. The weaker notion (sometimes known as a prefiltration) permits the group {G_1} to be strictly smaller than {G_0}, while the stronger notion requires {G_0} and {G_1} to equal. In practice, one can often move between the two concepts, as {G_1} is always normal in {G_0}, and a prefiltration behaves like a filtration on every coset of {G_1} (after applying a translation and perhaps also a conjugation). However, we did not clarify this issue sufficiently in the paper, and there are some places in the text where results that were only proven for filtrations were applied for prefiltrations. The erratum fixes this issues, mostly by clarifying that we work with filtrations throughout (which requires some decomposition into cosets in places where prefiltrations are generated). Similar adjustments need to be made for multidegree filtrations and degree-rank filtrations, which we also use heavily on our paper.

In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group {G_s} was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter {h}).

Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.

Ben Green and I have updated our paper “An arithmetic regularity lemma, an associated counting lemma, and applications” to account for a somewhat serious issue with the paper that was pointed out to us recently by Daniel Altman. This paper contains two core theorems:

  • An “arithmetic regularity lemma” that, roughly speaking, decomposes an arbitrary bounded sequence {f(n)} on an interval {\{1,\dots,N\}} as an “irrational nilsequence” {F(g(n) \Gamma)} of controlled complexity, plus some “negligible” errors (where one uses the Gowers uniformity norm as the main norm to control the neglibility of the error); and
  • An “arithmetic counting lemma” that gives an asymptotic formula for counting various averages {{\mathbb E}_{{\bf n} \in {\bf Z}^d \cap P} f(\psi_1({\bf n})) \dots f(\psi_t({\bf n}))} for various affine-linear forms {\psi_1,\dots,\psi_t} when the functions {f} are given by irrational nilsequences.

The combination of the two theorems is then used to address various questions in additive combinatorics.

There are no direct issues with the arithmetic regularity lemma. However, it turns out that the arithmetic counting lemma is only true if one imposes an additional property (which we call the “flag property”) on the affine-linear forms {\psi_1,\dots,\psi_t}. Without this property, there does not appear to be a clean asymptotic formula for these averages if the only hypothesis one places on the underlying nilsequences is irrationality. Thus when trying to understand the asymptotics of averages involving linear forms that do not obey the flag property, the paradigm of understanding these averages via a combination of the regularity lemma and a counting lemma seems to require some significant revision (in particular, one would probably have to replace the existing regularity lemma with some variant, despite the fact that the lemma is still technically true in this setting). Fortunately, for most applications studied to date (including the important subclass of translation-invariant affine forms), the flag property holds; however our claim in the paper to have resolved a conjecture of Gowers and Wolf on the true complexity of systems of affine forms must now be narrowed, as our methods only verify this conjecture under the assumption of the flag property.

In a bit more detail: the asymptotic formula for our counting lemma involved some finite-dimensional vector spaces {\Psi^{[i]}} for various natural numbers {i}, defined as the linear span of the vectors {(\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n}))} as {{\bf n}} ranges over the parameter space {{\bf Z}^d}. Roughly speaking, these spaces encode some constraints one would expect to see amongst the forms {\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n})}. For instance, in the case of length four arithmetic progressions when {d=2}, {{\bf n} = (n,r)}, and

\displaystyle  \psi_i({\bf n}) = n + (i-1)r

for {i=1,2,3,4}, then {\Psi^{[1]}} is spanned by the vectors {(1,1,1,1)} and {(1,2,3,4)} and can thus be described as the two-dimensional linear space

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): a-2b+c = b-2c+d = 0\} \ \ \ \ \ (1)

while {\Psi^{[2]}} is spanned by the vectors {(1,1,1,1)}, {(1,2,3,4)}, {(1^2,2^2,3^2,4^2)} and can be described as the hyperplane

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): a-3b+3c-d = 0 \}. \ \ \ \ \ (2)

As a special case of the counting lemma, we can check that if {f} takes the form {f(n) = F( \alpha n, \beta n^2 + \gamma n)} for some irrational {\alpha,\beta \in {\bf R}/{\bf Z}}, some arbitrary {\gamma \in {\bf R}/{\bf Z}}, and some smooth {F: {\bf R}/{\bf Z} \times {\bf R}/{\bf Z} \rightarrow {\bf C}}, then the limiting value of the average

\displaystyle  {\bf E}_{n, r \in [N]} f(n) f(n+r) f(n+2r) f(n+3r)

as {N \rightarrow \infty} is equal to

\displaystyle  \int_{a_1,b_1,c_1,d_1 \in {\bf R}/{\bf Z}: a_1-2b_1+c_1=b_1-2c_1+d_1=0} \int_{a_2,b_2,c_2,d_2 \in {\bf R}/{\bf Z}: a_2-3b_2+3c_2-d_2=0}

\displaystyle  F(a_1,a_2) F(b_1,b_2) F(c_1,c_2) F(d_1,d_2)

which reflects the constraints

\displaystyle  \alpha n - 2 \alpha(n+r) + \alpha(n+2r) = \alpha(n+r) - 2\alpha(n+2r)+\alpha(n+3r)=0

and

\displaystyle  (\beta n^2 + \gamma n) - 3 (\beta(n+r)^2+\gamma(n+r))

\displaystyle + 3 (\beta(n+2r)^2 +\gamma(n+2r)) - (\beta(n+3r)^2+\gamma(n+3r))=0.

These constraints follow from the descriptions (1), (2), using the containment {\Psi^{[1]} \subset \Psi^{[2]}} to dispense with the lower order term {\gamma n} (which then plays no further role in the analysis).

The arguments in our paper turn out to be perfectly correct under the assumption of the “flag property” that {\Psi^{[i]} \subset \Psi^{[i+1]}} for all {i}. The problem is that the flag property turns out to not always hold. A counterexample, provided by Daniel Altman, involves the four linear forms

\displaystyle  \psi_1(n,r) = r; \psi_2(n,r) = 2n+2r; \psi_3(n,r) = n+3r; \psi_4(n,r) = n.

Here it turns out that

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): d-c=3a; b-2a=2d\}

and

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): 24a+3b-4c-8d=0 \}

and {\Psi^{[1]}} is no longer contained in {\Psi^{[2]}}. The analogue of the asymptotic formula given previously for {f(n) = F( \alpha n, \beta n^2 + \gamma n)} is then valid when {\gamma} vanishes, but not when {\gamma} is non-zero, because the identity

\displaystyle  24 (\beta \psi_1(n,r)^2 + \gamma \psi_1(n,r)) + 3 (\beta \psi_2(n,r)^2 + \gamma \psi_2(n,r))

\displaystyle - 4 (\beta \psi_3(n,r)^2 + \gamma \psi_3(n,r)) - 8 (\beta \psi_4(n,r)^2 + \gamma \psi_4(n,r)) = 0

holds in the former case but not the latter. Thus the output of any purported arithmetic regularity lemma in this case is now sensitive to the lower order terms of the nilsequence and cannot be described in a uniform fashion for all “irrational” sequences. There should still be some sort of formula for the asymptotics from the general equidistribution theory of nilsequences, but it could be considerably more complicated than what is presented in this paper.

Fortunately, the flag property does hold in several key cases, most notably the translation invariant case when {\Psi^{[1]}} contains {(1,\dots,1)}, as well as “complexity one” cases. Nevertheless non-flag property systems of affine forms do exist, thus limiting the range of applicability of the techniques in this paper. In particular, the conjecture of Gowers and Wolf (Theorem 1.13 in the paper) is now open again in the non-flag property case.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv a completely rewritten version of our previous paper, now titled “Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra“. This paper is now a survey of the various literature surrounding the following basic identity in linear algebra, which we propose to call the eigenvector-eigenvalue identity:

Theorem 1 (Eigenvector-eigenvalue identity) Let {A} be an {n \times n} Hermitian matrix, with eigenvalues {\lambda_1(A),\dots,\lambda_n(A)}. Let {v_i} be a unit eigenvector corresponding to the eigenvalue {\lambda_i(A)}, and let {v_{i,j}} be the {j^{th}} component of {v_i}. Then

\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))

where {M_j} is the {n-1 \times n-1} Hermitian matrix formed by deleting the {j^{th}} row and column from {A}.

When we posted the first version of this paper, we were unaware of previous appearances of this identity in the literature; a related identity had been used by Erdos-Schlein-Yau and by myself and Van Vu for applications to random matrix theory, but to our knowledge this specific identity appeared to be new. Even two months after our preprint first appeared on the arXiv in August, we had only learned of one other place in the literature where the identity showed up (by Forrester and Zhang, who also cite an earlier paper of Baryshnikov).

The situation changed rather dramatically with the publication of a popular science article in Quanta on this identity in November, which gave this result significantly more exposure. Within a few weeks we became informed (through private communication, online discussion, and exploration of the citation tree around the references we were alerted to) of over three dozen places where the identity, or some other closely related identity, had previously appeared in the literature, in such areas as numerical linear algebra, various aspects of graph theory (graph reconstruction, chemical graph theory, and walks on graphs), inverse eigenvalue problems, random matrix theory, and neutrino physics. As a consequence, we have decided to completely rewrite our article in order to collate this crowdsourced information, and survey the history of this identity, all the known proofs (we collect seven distinct ways to prove the identity (or generalisations thereof)), and all the applications of it that we are currently aware of. The citation graph of the literature that this ad hoc crowdsourcing effort produced is only very weakly connected, which we found surprising:

The earliest explicit appearance of the eigenvector-eigenvalue identity we are now aware of is in a 1966 paper of Thompson, although this paper is only cited (directly or indirectly) by a fraction of the known literature, and also there is a precursor identity of Löwner from 1934 that can be shown to imply the identity as a limiting case. At the end of the paper we speculate on some possible reasons why this identity only achieved a modest amount of recognition and dissemination prior to the November 2019 Quanta article.

I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if {D,X \in B(H)} are bounded operators on a Hilbert space {H} whose commutator {[D,X] := DX-XD} is close to the identity in the sense that

\displaystyle  \| [D,X] - I \|_{op} \leq \varepsilon \ \ \ \ \ (1)

for some {\varepsilon > 0}, then one has the lower bound

\displaystyle  \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (2)

In the other direction, for any {0 < \varepsilon < 1}, there are examples of operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \varepsilon^{-2}. \ \ \ \ \ (3)

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1 For any {0 < \varepsilon < 1/2}, and any infinite-dimensional {H}, there exist operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \log^{16} \frac{1}{\varepsilon}. \ \ \ \ \ (4)

One can probably improve the exponent {16} somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to {1} without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying {B(H)} with the algebra {M_2(B(H))} of {2 \times 2} matrices that have entries in {B(H)}. It is then possible to find two matrices {D, X \in M_2(B(H))} whose commutator takes the form

\displaystyle  [D,X] = \begin{pmatrix} I & u \\ 0 & I \end{pmatrix}

for some bounded operator {u \in B(H)} (for instance one can take {u} to be an isometry). If one then conjugates {D, X} by the diagonal operator {\mathrm{diag}(\varepsilon,1)}, one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to {n \times n} matrices {D,X \in M_n(B(H))} rather than {2 \times 2} matrices, where {n} is a parameter at one’s disposal. If one can find matrices {D,X \in M_n(B(H))} that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator {[D,X]} only differs from the identity in the top right corner, thus

\displaystyle  [D, X] = \begin{pmatrix} I & 0 & 0 & \dots & 0 & S \\ 0 & I & 0 & \dots & 0 & 0 \\ 0 & 0 & I & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & I & 0 \\ 0 & 0 & 0 & \dots & 0 & I \end{pmatrix}.

for some {S}, then by conjugating by a diagonal matrix such as {\mathrm{diag}( \mu^{n-1}, \mu^{n-2}, \dots, 1)} for some {\mu} and optimising in {\mu}, one can improve the bound {\varepsilon^{-2}} in (3) to {O_n( \varepsilon^{-\frac{2}{n-1}} )}; if the bounds in the implied constant in the {O_n(1)} are polynomial in {n}, one can then optimise in {n} to obtain a bound of the form (4) (perhaps with the exponent {16} replaced by a different constant).

The task is then to find almost upper triangular matrices {D, X} whose commutator takes the required form. The lower diagonals of {D,X} must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say {X} by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say {u}. After a lot of further lengthy experimentation, I eventually realised that one could conjugate {X} further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that {X} takes the normal form

\displaystyle  X := \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & b_1 \\ I & 0 & 0 & \dots & 0 & b_2 \\ 0 & I & 0 & \dots & 0 & b_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & 0 & b_{n-1} \\ 0 & 0 & 0 & \dots & I & b_n \end{pmatrix}.

\displaystyle  D := \begin{pmatrix} v & I & 0 & \dots & 0 & b_1 u \\ u & v & 2 I & \dots & 0 & b_2 u \\ 0 & u & v & \dots & 0 & b_3 u \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & v & (n-1) I + b_{n-1} u \\ 0 & 0 & 0 & \dots & u & v + b_n u \end{pmatrix}

for some {u,v \in B(H)}, solving the system of equations

\displaystyle  [v, b_i] + [u, b_{i-1}] + i b_{i+1} + b_i [u, b_n] = 0 \ \ \ \ \ (5)

for {i=2,\dots,n-1}, and also

\displaystyle  [v, b_n] + [u, b_{n-1}] + b_n [u, b_n] = n \cdot 1_{B(H)}. \ \ \ \ \ (6)

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes {u,v} to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in {n} in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if {A \in B(H)} obeys the bounds

\displaystyle  \|A \| = O(1)

and

\displaystyle  \| A \| = O( \mathrm{dist}( A, {\bf C} + K(H) )^{2/3} ) \ \ \ \ \ (7)

(where {{\bf C} + K(H)} denotes the space of all operators of the form {\lambda I + T} with {\lambda \in {\bf C}} and {T} compact), then there exist operators {D,X \in B(H)} with {\|D\|, \|X\| = O(1)} such that {A = [D,X]}. (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite {W^*} algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that {\mathrm{dist}(A, {\bf C} + K(H)) = \varepsilon} and {\|A\| = O( \varepsilon^{2/3})} for some {0 < \varepsilon \ll 1}. A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors {e_n, f_n, g_n} for {n=1,2,\dots} such that for each {n}, one has {A e_n = \varepsilon_n f_n + v_n} for some {\varepsilon_n} comparable to {\varepsilon}, and some {v_n} orthogonal to all of the {e_n,f_n,g_n}. After some conjugation (and a suitable identification of {B(H)} with {M_2(B(H))}, one can thus place {A} in a normal form

\displaystyle  A = \begin{pmatrix} \varepsilon^{2/3} x & \varepsilon v^* \\ \varepsilon^{2/3} y & \varepsilon^{2/3} z \end{pmatrix}

where {v \in B(H)} is a isometry with infinite deficiency, and {x,y,z \in B(H)} have norm {O(1)}. Setting {\varepsilon' := \varepsilon^{1/3}}, it then suffices to solve the commutator equation

\displaystyle  [D,X] = \begin{pmatrix} x & \varepsilon' v^* \\ y & z \end{pmatrix}

with {\|D\|_{op} \|X\|_{op} \ll (\varepsilon')^{-2}}; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement {v} with another isometry {u} obeying the “Hilbert’s hotel” identity

\displaystyle  uu^* + vv^* = I

and also {u^* u = v^* v = I}, {u^* v = v^* u = 0}. Proceeding as in the previous post, we can try the ansatz

\displaystyle  D = \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, X = \begin{pmatrix} b & \varepsilon' I \\ c & d \end{pmatrix}

for some operators {a,b,c,d \in B(H)}, leading to the system of equations

\displaystyle  [\frac{1}{2} u^*, b] + [\frac{1}{2} u^* - v^*, c] = x+z

\displaystyle  \varepsilon' a = [\frac{1}{2} u^*, b] - x

\displaystyle  \frac{1}{2} u^* c + c (\frac{1}{2} u^* - v^*) + ab-da = y.

Using the first equation to solve for {b,c}, the second to then solve for {a}, and the third to then solve for {c}, one can obtain matrices {D,X} with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on {D,X} have been unsuccessful. A model problem would be to express

\displaystyle  \begin{pmatrix} I & 0 & \varepsilon v^* \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}

as a commutator {[D,X]} with {\|D\| \|X\|} significantly smaller than {O(\varepsilon^{-2})}. The construction in my paper achieves something like this, but with {v^*} replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

Apoorva Khare and I have updated our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“, announced at this post from last month. The quantitative results are now sharpened using a new monotonicity property of ratios {s_{\lambda}(u)/s_{\mu}(u)} of Schur polynomials, namely that such ratios are monotone non-decreasing in each coordinate of {u} if {u} is in the positive orthant, and the partition {\lambda} is larger than that of {\mu}. (This monotonicity was also independently observed by Rachid Ait-Haddou, using the theory of blossoms.) In the revised version of the paper we give two proofs of this monotonicity. The first relies on a deep positivity result of Lam, Postnikov, and Pylyavskyy, which uses a representation-theoretic positivity result of Haiman to show that the polynomial combination

\displaystyle s_{(\lambda \wedge \nu) / (\mu \wedge \rho)} s_{(\lambda \vee \nu) / (\mu \vee \rho)} - s_{\lambda/\mu} s_{\nu/\rho} \ \ \ \ \ (1)

of skew-Schur polynomials is Schur-positive for any partitions {\lambda,\mu,\nu,\rho} (using the convention that the skew-Schur polynomial {s_{\lambda/\mu}} vanishes if {\mu} is not contained in {\lambda}, and where {\lambda \wedge \nu} and {\lambda \vee \nu} denotes the pointwise min and max of {\lambda} and {\nu} respectively). It is fairly easy to derive the monotonicity of {s_\lambda(u)/s_\mu(u)} from this, by using the expansion

\displaystyle s_\lambda(u_1,\dots, u_n) = \sum_k u_1^k s_{\lambda/(k)}(u_2,\dots,u_n)

of Schur polynomials into skew-Schur polynomials (as was done in this previous post).

The second proof of monotonicity avoids representation theory by a more elementary argument establishing the weaker claim that the above expression (1) is non-negative on the positive orthant. In fact we prove a more general determinantal log-supermodularity claim which may be of independent interest:

Theorem 1 Let {A} be any {n \times n} totally positive matrix (thus, every minor has a non-negative determinant). Then for any {k}-tuples {I_1,I_2,J_1,J_2} of increasing elements of {\{1,\dots,n\}}, one has

\displaystyle \det( A_{I_1 \wedge I_2, J_1 \wedge J_2} ) \det( A_{I_1 \vee I_2, J_1 \vee J_2} ) - \det(A_{I_1,J_1}) \det(A_{I_2,J_2}) \geq 0

where {A_{I,J}} denotes the {k \times k} minor formed from the rows in {I} and columns in {J}.

For instance, if {A} is the matrix

\displaystyle A = \begin{pmatrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{pmatrix}

for some real numbers {a,\dots,p}, one has

\displaystyle a h - de\geq 0

(corresponding to the case {k=1}, {I_1 = (1), I_2 = (2), J_1 = (4), J_2 = (1)}), or

\displaystyle \det \begin{pmatrix} a & c \\ i & k \end{pmatrix} \det \begin{pmatrix} f & h \\ n & p \end{pmatrix} - \det \begin{pmatrix} e & h \\ i & l \end{pmatrix} \det \begin{pmatrix} b & c \\ n & o \end{pmatrix} \geq 0

(corresponding to the case {k=2}, {I_1 = (2,3)}, {I_2 = (1,4)}, {J_1 = (1,4)}, {J_2 = (2,3)}). It turns out that this claim can be proven relatively easy by an induction argument, relying on the Dodgson and Karlin identities from this previous post; the difficulties are largely notational in nature. Combining this result with the Jacobi-Trudi identity for skew-Schur polynomials (discussed in this previous post) gives the non-negativity of (1); it can also be used to directly establish the monotonicity of ratios {s_\lambda(u)/s_\mu(u)} by applying the theorem to a generalised Vandermonde matrix.

(Log-supermodularity also arises as the natural hypothesis for the FKG inequality, though I do not know of any interesting application of the FKG inequality in this current setting.)

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.

I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces {S[k] = S[k]({\bf R}^{1+n})} are constructed for each frequency scale {k}, and then a further space {S(c) = S(c)({\bf R}^{1+n})} is constructed for a given frequency envelope {c} by the formula

\displaystyle  \| \phi \|_{S(c)({\bf R}^{1+n})} := \|\phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]({\bf R}^{1+n})} \ \ \ \ \ (1)

where {\phi_k := P_k \phi} is the Littlewood-Paley projection of {\phi} to frequency magnitudes {\sim 2^k}. Then, given a spacetime slab {[-T,T] \times {\bf R}^n}, we define the restrictions

\displaystyle  \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi \|_{S(c)({\bf R}^{1+n})}: \tilde \phi \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi \}

where the infimum is taken over all extensions {\tilde \phi} of {\phi} to the Minkowski spacetime {{\bf R}^{1+n}}; similarly one defines

\displaystyle  \| \phi_k \|_{S_k([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi_k \|_{S_k({\bf R}^{1+n})}: \tilde \phi_k \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi_k \}.

The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab {[-T,T] \times {\bf R}^n} to obtain the equality

\displaystyle  \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} = \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}.

(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions {\tilde \phi_k} of {\phi_k} that are optimal for computing {\| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}} are not necessarily the Littlewood-Paley projections of the extensions {\tilde \phi} of {\phi} that are optimal for computing {\| \phi \|_{S(c)([-T,T] \times {\bf R}^n)}}.

To remedy the problem, one has to prove an upper bound of the form

\displaystyle  \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} \lesssim \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}

for all Schwartz {\phi} (actually we need affinely Schwartz {\phi}, but one can easily normalise to the Schwartz case). Without loss of generality we may normalise the RHS to be {1}. Thus

\displaystyle  \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} \leq 1 \ \ \ \ \ (2)

and

\displaystyle  \|P_k \phi \|_{S[k]([-T,T] \times {\bf R}^n)} \leq c_k \ \ \ \ \ (3)

for each {k}, and one has to find a single extension {\tilde \phi} of {\phi} such that

\displaystyle  \|\tilde \phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 1 \ \ \ \ \ (4)

and

\displaystyle  \|P_k \tilde \phi \|_{S[k]({\bf R}^{1+n})} \lesssim c_k \ \ \ \ \ (5)

for each {k}. Achieving a {\tilde \phi} that obeys (4) is trivial (just extend {\phi} by zero), but such extensions do not necessarily obey (5). On the other hand, from (3) we can find extensions {\tilde \phi_k} of {P_k \phi} such that

\displaystyle  \|\tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim c_k; \ \ \ \ \ (6)

the extension {\tilde \phi := \sum_k \tilde \phi_k} will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the {S[k]} norm does control the {L^\infty_t L^\infty_x} norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients {c_k}, while bounded, are not necessarily summable).

This can be fixed as follows. For each {k} we introduce a time cutoff {\eta_k} supported on {[-T-2^{-k}, T+2^{-k}]} that equals {1} on {[-T-2^{-k-1},T+2^{-k+1}]} and obeys the usual derivative estimates in between (the {j^{th}} time derivative of size {O_j(2^{jk})} for each {j}). Later we will prove the truncation estimate

\displaystyle  \| \eta_k \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})}. \ \ \ \ \ (7)

Assuming this estimate, then if we set {\tilde \phi := \sum_k \eta_k \tilde \phi_k}, then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that {\tilde \phi} is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the {k} summation and taking limits we can check that {\tilde \phi} is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that

\displaystyle  \| \sum_k \eta_k(t) \tilde \phi_k(t) \|_{L^\infty_x({\bf R}^n)} \lesssim 1

for all {t \in {\bf R}}.

For {t \in [-T,T]} this is immediate from (2). Now suppose that {t \in [T+2^{k_0-1}, T+2^{k_0}]} for some integer {k_0} (the case when {t \in [-T-2^{k_0}, -T-2^{k_0-1}]} is treated similarly). Then we can split

\displaystyle  \sum_k \eta_k(t) \tilde \phi_k(t) = \Phi_1 + \Phi_2 + \Phi_3

where

\displaystyle  \Phi_1 := \sum_{k < k_0} \tilde \phi_k(T)

\displaystyle  \Phi_2 := \sum_{k < k_0} \tilde \phi_k(t) - \tilde \phi_k(T)

\displaystyle  \Phi_3 := \eta_{k_0}(t) \tilde \phi_{k_0}(t).

The contribution of the {\Phi_3} term is acceptable by (6) and estimate (82) from my paper. The term {\Phi_1} sums to {P_{<k_0} \phi(T)} which is acceptable by (2). So it remains to control the {L^\infty_x} norm of {\Phi_2}. By the triangle inequality and the fundamental theorem of calculus, we can bound

\displaystyle  \| \Phi_2 \|_{L^\infty_x} \leq (t-T) \sum_{k < k_0} \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})}.

By hypothesis, {t-T \leq 2^{-k_0}}. Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have

\displaystyle  \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 2^k \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim 2^k;

and then we are done by summing the geometric series in {k}.

It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.

Read the rest of this entry »

Two quick updates with regards to polymath projects.  Firstly, given the poll on starting the mini-polymath4 project, I will start the project at Thu July 12 2012 UTC 22:00.  As usual, the main research thread on this project will be held at the polymath blog, with the discussion thread hosted separately on this blog.

Second, the Polymath7 project, which seeks to establish the “hot spots conjecture” for acute-angled triangles, has made a fair amount of progress so far; for instance, the first part of the conjecture (asserting that the second Neumann eigenfunction of an acute non-equilateral triangle is simple) is now solved, and the second part (asserting that the “hot spots” (i.e. extrema) of that second eigenfunction lie on the boundary of the triangle) has been solved in a number of special cases (such as the isosceles case).  It’s been quite an active discussion in the last week or so, with almost 200 comments across two threads (and a third thread freshly opened up just now).  While the problem is still not completely solved, I feel optimistic that it should fall within the next few weeks (if nothing else, it seems that the problem is now at least amenable to a brute force numerical attack, though personally I would prefer to see a more conceptual solution).

Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown.  (Requires either Java or Flash.)   Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:


After completing each level, an icon appears which one can click on to proceed to the next level.  (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)

The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)

One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)

My graduate text on measure theory (based on these lecture notes) is now published by the AMS as part of the Graduate Studies in Mathematics series.  (See also my own blog page for this book, which among other things contains a draft copy of the book in PDF format.)

A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post.  As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity.  But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions.   As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion.   I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link).  (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.)   I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth H^1 initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.

Let me now describe the issue in more detail (and also to explain why I missed it previously).  A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time.  To illustrate this, consider a solution u to the supercritical nonlinear wave equation

-\partial_{tt} u + \Delta u = u^7  (1)

for some field u: {\bf R} \times {\bf R}^3 \to {\bf R}.   Suppose one already knew that u had some regularity in space, and in particular the C^0_t C^2_x \cap C^1_t C^1_x norm of u was bounded (thus u and up to two spatial derivatives of u were bounded).  Then, by (1), we see that two time derivatives of u were also bounded, and one then gets the additional regularity of C^2_t C^0_x.

In a similar vein, suppose one initially knew that u had the regularity C^0_t C^3_x \cap C^1_t C^2_x.  Then (1) soon tells us that u also has the regularity C^2_t C^1_x; then, if one differentiates (1) in time to obtain

-\partial_{ttt} u + \Delta \partial_t u = 7 u^6 \partial_t u

one can conclude that u also has the regularity of C^3_t C^0_x.  One can continue this process indefinitely; in particular, if one knew that u \in C^0_t C^\infty_x \cap C^1_t C^\infty_x, then these sorts of manipulations show that u is infinitely smooth in both space and time.

The issue that caught me by surprise is that for the Navier-Stokes equations

\partial_t u + (u \cdot \nabla) u =\Delta u -\nabla p  (2)

\nabla \cdot u = 0

(setting the forcing term f equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space H^1_x({\bf R}^3).  The problem lies with the pressure term p, which is recovered from the velocity via the elliptic equation

\Delta p = -\nabla^2 \cdot (u \otimes u) (3)

that can be obtained by taking the divergence of (2).   This equation is solved by a non-local integral operator:

\displaystyle p(t,x) = \int_{{\bf R}^3} \frac{\nabla^2 \cdot (u \otimes u)(t,y)}{4\pi |x-y|}\ dy.

If, say, u lies in H^1_x({\bf R}^3), then there is no difficulty establishing a bound on p in terms of u (for instance, one can use singular integral theory and Sobolev embedding to place p in L^3_x({\bf R}^3).  However, one runs into difficulty when trying to compute time derivatives of p.  Differentiating (3) once, one gets

\Delta \partial_t p = -2\nabla^2 \cdot (u \otimes \partial_t u).

At the regularity of H^1, one can still (barely) control this quantity by using (2) to expand out \partial_t u and using some integration by parts.  But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form

\Delta \partial_{tt} p = -4\nabla^2 \cdot (\Delta u \otimes \Delta u) + \ldots

and now there is not enough regularity on u available to get any control on \partial_{tt} p, even if one assumes that u is smooth.   Indeed, following this observation, I was able to show that given generic smooth H^1 data, the pressure p will instantaneously fail to be C^2 in time, and thence (by (2)) the velocity will instantaneously fail to be C^3 in time.  (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity \omega will fail to be C^4 in time.  Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)

For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade H^1_x regularity to H^\infty_x regularity instantaneously.  It is only the initial time at which some time irregularity can occur.

This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing).  In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild H^1 solution) and the data is Schwartz, then the solution will be smooth in spacetime.   (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)

This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth H^1 data) is not correct as stated.  But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time.  More precisely, call a solution u: [0,T] \times {\bf R}^3 \to {\bf R}^3 and p: [0,T] \times {\bf R}^3 \to {\bf R} almost smooth if

  • u and p are smooth on the half-open slab (0,T] \times {\bf R}^3; and
  • For every k \geq 0, \nabla^k_x u, \nabla^k_x p, \nabla^x_u \partial_t u exist and are continuous on the full slab [0,T] \times {\bf R}^3.

Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only C^1_t C^\infty_x, and the pressure field is only C^0_t C^\infty_x.  This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.

(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth H^1 setting also.)

One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth H^1 or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth.  Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth H^1 or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).

The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:

Archives