Monthly Archive

You are currently browsing the monthly archive for May 2015.

A differentiation identity

30 May, 2015 in math.CA, question | by Terence Tao | 48 comments

Here’s a cute identity I discovered by accident recently. Observe that

$\displaystyle \frac{d}{dx} (1+x^2)^{0/2} = 0$

$\displaystyle \frac{d^2}{dx^2} (1+x^2)^{1/2} = \frac{1}{(1+x^2)^{3/2}}$

$\displaystyle \frac{d^3}{dx^3} (1+x^2)^{2/2} = 0$

$\displaystyle \frac{d^4}{dx^4} (1+x^2)^{3/2} = \frac{9}{(1+x^2)^{5/2}}$

$\displaystyle \frac{d^5}{dx^5} (1+x^2)^{4/2} = 0$

$\displaystyle \frac{d^6}{dx^6} (1+x^2)^{5/2} = \frac{225}{(1+x^2)^{7/2}}$

and so one can conjecture that one has

$\displaystyle \frac{d^{k+1}}{dx^{k+1}} (1+x^2)^{k/2} = 0$

when $k$ is even, and

$\displaystyle \frac{d^{k+1}}{dx^{k+1}} (1+x^2)^{k/2} = \frac{(1 \times 3 \times \dots \times k)^2}{(1+x^2)^{(k+2)/2}}$

when $k$ is odd. This is obvious in the even case since $(1+x^2)^{k/2}$ is a polynomial of degree $k$ , but I struggled for a while with the odd case before finding a slick three-line proof. (I was first trying to prove the weaker statement that $\frac{d^{k+1}}{dx^{k+1}} (1+x^2)^{k/2}$ was non-negative, but for some strange reason I was only able to establish this by working out the derivative exactly, rather than by using more analytic methods, such as convexity arguments.) I thought other readers might like the challenge (and also I’d like to see some other proofs), so rather than post my own proof immediately, I’ll see if anyone would like to supply their own proofs or thoughts in the comments. Also I am curious to know if this identity is connected to any other existing piece of mathematics.

Cancellation for the multilinear Hilbert transform

25 May, 2015 in math.CA, math.CO, paper | Tags: arithmetic regularity lemma, multilinear Hilbert transform | by Terence Tao | 10 comments

I’ve just uploaded to the arXiv my paper “Cancellation for the multilinear Hilbert transform“, submitted to Collectanea Mathematica. This paper uses methods from additive combinatorics (and more specifically, the arithmetic regularity and counting lemmas from this paper of Ben Green and myself) to obtain a slight amount of progress towards the open problem of obtaining ${L^p}$ bounds for the trilinear and higher Hilbert transforms (as discussed in this previous blog post). For instance, the trilinear Hilbert transform

$\displaystyle H_3( f_1, f_2, f_3 )(x) := p.v. \int_{\bf R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}$

is not known to be bounded for any ${L^{p_1}({\bf R}) \times L^{p_2}({\bf R}) \times L^{p_3}({\bf R})}$ to ${L^p({\bf R})}$ , although it is conjectured to do so when ${1/p =1/p_1 +1/p_2+1/p_3}$ and ${1 < p_1,p_2,p_3,p < \infty}$ . (For ${p}$ well below ${1}$ , one can use additive combinatorics constructions to demonstrate unboundedness; see this paper of Demeter.) One can approach this problem by considering the truncated trilinear Hilbert transforms

$\displaystyle H_{3,r,R}( f_1, f_2, f_3 )(x) := \int_{r \leq |t| \leq R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}$

for ${0 < r < R}$ . It is not difficult to show that the boundedness of ${H_3}$ is equivalent to the boundedness of ${H_{3,r,R}}$ with bounds that are uniform in ${R}$ and ${r}$ . On the other hand, from Minkowski’s inequality and Hölder’s inequality one can easily obtain the non-uniform bound of ${2 \log \frac{R}{r}}$ for ${H_{3,r,R}}$ . The main result of this paper is a slight improvement of this trivial bound to ${o( \log \frac{R}{r})}$ as ${R/r \rightarrow \infty}$ . Roughly speaking, the way this gain is established is as follows. First there are some standard time-frequency type reductions to reduce to the task of obtaining some non-trivial cancellation on a single “tree”. Using a “generalised von Neumann theorem”, we show that such cancellation will happen if (a discretised version of) one or more of the functions ${f_1,f_2,f_3}$ (or a dual function ${f_0}$ that it is convenient to test against) is small in the Gowers ${U^3}$ norm. However, the arithmetic regularity lemma alluded to earlier allows one to represent an arbitrary function ${f_i}$ , up to a small error, as the sum of such a “Gowers uniform” function, plus a structured function (or more precisely, an irrational virtual nilsequence). This effectively reduces the problem to that of establishing some cancellation in a single tree in the case when all functions ${f_0,f_1,f_2,f_3}$ involved are irrational virtual nilsequences. At this point, the contribution of each component of the tree can be estimated using the “counting lemma” from my paper with Ben. The main term in the asymptotics is a certain integral over a nilmanifold, but because the kernel ${\frac{dt}{t}}$ in the trilinear Hilbert transform is odd, it turns out that this integral vanishes, giving the required cancellation.

The same argument works for higher order Hilbert transforms (and one can also replace the coefficients in these transforms with other rational constants). However, because the quantitative bounds in the arithmetic regularity and counting lemmas are so poor, it does not seem likely that one can use these methods to remove the logarithmic growth in ${R/r}$ entirely, and some additional ideas will be needed to resolve the full conjecture.

Failure of the L^1 pointwise and maximal ergodic theorems for the free group

18 May, 2015 in math.CA, math.DS, paper | Tags: free group, maximal ergodic theorem, pointwise ergodic theorem | by Terence Tao | 3 comments

I’ve just uploaded to the arXiv my paper “Failure of the ${L^1}$ pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map ${T: X \rightarrow X}$ on a probability space ${X = (X,\mu)}$ , then for any ${f \in L^1(X)}$ , the averages ${\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}}$ converge pointwise almost everywhere. (In the important case when the shift map ${T}$ is ergodic, the pointwise limit is simply the mean ${\int_X f\ d\mu}$ of the original function ${f}$ .)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group ${F_2}$ on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for ${F_2}$ -actions ${(T_g)_{g \in F_2}}$ on probability spaces ${(X,\mu)}$ . For instance, for the spherical averaging operators

$\displaystyle {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}$

(where ${|g|}$ denotes the length of the reduced word that forms ${g}$ ), they showed that ${{\mathcal A}_{2n} f}$ converged pointwise almost everywhere provided that ${f}$ was in ${L^p(X)}$ for some ${p>1}$ . (The need to restrict to spheres of even radius can be seen by considering the action of ${F_2}$ on the two-element set ${\{0,1\}}$ in which both generators of ${F_2}$ act by interchanging the elements, in which case ${{\mathcal A}_n}$ is determined by the parity of ${n}$ .) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition ${f \in L^p(X)}$ to the weaker condition ${f \in L \log L(X)}$ .

The question remained open as to whether the pointwise ergodic theorem for ${F_2}$ -actions held if one only assumed that ${f}$ was in ${L^1(X)}$ . Nevo and Stein were able to establish this for the Cesáro averages ${\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}$ , but not for ${{\mathcal A}_n}$ itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on ${\ell^1(F_2)}$ , but due to the non-amenability of ${F_2}$ , this inequality did not transfer to ${L^1(X)}$ and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the ${L^1}$ pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our ${\ell^1(F_2)}$ maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an ${L^1}$ ergodic theorem for iterates ${P^n}$ of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the ${F_2}$ setting, thus settling the problem in the negative:

Theorem 1 (Failure of ${L^1}$ pointwise ergodic theorem) There exists a measure-preserving ${F_2}$ -action on a probability space ${X}$ and a non-negative function ${f \in L^1(X)}$ such that ${\sup_n {\mathcal A}_{2n} f(x) = +\infty}$ for almost every ${x}$ .

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator ${P}$ on a probability space ${X}$ and a non-negative ${f \in L^1(X)}$ such that ${\sup_n P^n f(x) = +\infty}$ for almost every ${x}$ . By some standard manipulations, it suffices to show that for any given ${\alpha > 0}$ and ${\varepsilon>0}$ , there exists a self-adjoint Markov operator ${P}$ on a probability space ${X}$ and a non-negative ${f \in L^1(X)}$ with ${\|f\|_{L^1(X)} \leq \alpha}$ , such that ${\sup_n P^n f \geq 1-\varepsilon}$ on a set of measure at least ${1-\varepsilon}$ . Actually, it will be convenient to replace the Markov chain ${(P^n f)_{n \geq 0}}$ with an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ – that is to say, a sequence of non-negative functions ${f_n}$ for both positive and negative ${f}$ , such that ${f_{n+1} = P f_n}$ for all ${n \in {\bf Z}}$ . The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the ${F_2}$ version of the argument can be run using infinitely ancient chains.)

For any ${\alpha>0}$ , let ${P(\alpha)}$ denote the claim that for any ${\varepsilon>0}$ , there exists an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ with ${\|f_n\|_{L^1(X)} = \alpha}$ such that ${\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon}$ on a set of measure at least ${1-\varepsilon}$ . Clearly ${P(1)}$ holds since we can just take ${f_n=1}$ for all ${n}$ . Our objective is to show that ${P(\alpha)}$ holds for arbitrarily small ${\alpha}$ . The heart of Ornstein’s argument is then the implication

$\displaystyle P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)$

for any ${0 < \alpha \leq 1}$ , which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ on some probability space ${X}$ of total mass ${\|f_n\|_{L^1(X)} = \alpha}$ , such that ${\sup_n f_n}$ attains the value of ${1}$ or greater almost everywhere. Assuming that the Markov process is irreducible, the ${f_n}$ will eventually converge as ${n \rightarrow \infty}$ to the constant value of ${\|f_n\|_{L^1(X)}}$ , in particular its final state will essentially stay above ${\alpha}$ (up to small errors).

Now suppose we duplicate the Markov process by replacing ${X}$ with a double copy ${X \times \{1,2\}}$ (giving ${\{1,2\}}$ the uniform probability measure), and using the disjoint sum of the Markov operators on ${X \times \{1\}}$ and ${X \times \{2\}}$ as the propagator, so that there is no interaction between the two components of this new system. Then the functions ${f'_n(x,i) := f_n(x) 1_{i=1}}$ form an ancient Markov chain of mass at most ${\alpha/2}$ that lives solely in the first half ${X \times \{1\}}$ of this copy, and ${\sup_n f'_n}$ attains the value of ${1}$ or greater on almost all of the first half ${X \times \{1\}}$ , but is zero on the second half. The final state of ${f'_n}$ will be to stay above ${\alpha}$ in the first half ${X \times \{1\}}$ , but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves ${X \times \{1\}}$ , ${X \times \{2\}}$ of the system (I mentally think of ${X \times \{1\}}$ and ${X \times \{2\}}$ as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain ${(f'_n)_{n \in {\bf Z}}}$ in the previous example gets replaced by a slightly different ancient Markov chain ${(f''_n)_{n \in {\bf Z}}}$ which is more or less identical with ${f'_n}$ for negative times ${n}$ , or for bounded positive times ${n}$ , but for very large values of ${n}$ the final state is now constant across the entire state space ${X \times \{1,2\}}$ , and will stay above ${\alpha/2}$ on this space.

Finally, we consider an ancient Markov chain ${F_n}$ which is basically of the form

$\displaystyle F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}$

for some large parameter ${M}$ and for all ${n \leq M}$ (the approximation becomes increasingly inaccurate for ${n}$ much larger than ${M}$ , but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces ${X \times \{1\}, X \times \{2\}}$ , but with the second copy delayed by a large time delay ${M}$ , and also attenuated in amplitude by a factor of ${1-\frac{\alpha}{2}}$ . The total mass of this process is now ${\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}$ . Because of the ${f''_n}$ component of ${F_n}$ , we see that ${\sup_n F_n}$ basically attains the value of ${1}$ or greater on the first half ${X \times \{1\}}$ . On the second half ${X \times \{2\}}$ , we work with times ${n}$ close to ${M}$ . If ${M}$ is large enough, ${f''_n}$ would have averaged out to about ${\alpha/2}$ at such times, but the ${(1 - \frac{\alpha}{2}) f_{n-M}(x)}$ component can get as large as ${1-\alpha/2}$ here. Summing (and continuing to ignore various epsilon losses), we see that ${\sup_n F_n}$ can get as large as ${1}$ on almost all of the second half of ${X \times \{2\}}$ . This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages ${{\mathcal A}_n}$ for a free group action can be lifted up to become powers ${P^n}$ of a Markov operator, basically by randomly assigning a “velocity vector” ${s \in \{a,b,a^{-1},b^{-1}\}}$ to one’s base point ${x}$ and then applying the Markov process that moves ${x}$ along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from ${s}$ to ${s^{-1}}$ ). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of ${F_2}$ systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with ${F_2}$ -measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case ${P(1)}$ ) comes from basically considering the action of ${F_2}$ on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time ${n=-\infty}$ , and only reaches the boundary of this ball at the time ${n=0}$ .

A remark on the lonely runner conjecture

13 May, 2015 in expository, math.CA, question | Tags: arithmetic progressions, lonely runner conjecture | by Terence Tao | 58 comments

The lonely runner conjecture is the following open problem:

Conjecture 1 Suppose one has ${n \geq 1}$ runners on the unit circle ${{\bf R}/{\bf Z}}$ , all starting at the origin and moving at different speeds. Then for each runner, there is at least one time ${t}$ for which that runner is “lonely” in the sense that it is separated by a distance at least ${1/n}$ from all other runners.

One can normalise the speed of the lonely runner to be zero, at which point the conjecture can be reformulated (after replacing ${n}$ by ${n+1}$ ) as follows:

Conjecture 2 Let ${v_1,\dots,v_n}$ be non-zero real numbers for some ${n \geq 1}$ . Then there exists a real number ${t}$ such that the numbers ${tv_1,\dots,tv_n}$ are all a distance at least ${\frac{1}{n+1}}$ from the integers, thus ${\|tv_1\|_{{\bf R}/{\bf Z}},\dots,\|tv_n\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}}$ where ${\|x\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${x}$ to the nearest integer.

This conjecture has been proven for ${n \leq 7}$ , but remains open for larger ${n}$ . The bound ${\frac{1}{n+1}}$ is optimal, as can be seen by looking at the case ${v_i=i}$ and applying the Dirichlet approximation theorem. Note that for each non-zero ${v}$ , the set ${\{ t \in {\bf R}: \|vt\|_{{\bf R}/{\bf Z}} \leq r \}}$ has (Banach) density ${2r}$ for any ${0 < r < 1/2}$ , and from this and the union bound we can easily find ${t \in {\bf R}}$ for which

$\displaystyle \|tv_1\|_{{\bf R}/{\bf Z}},\dots,\|tv_n\|_{{\bf R}/{\bf Z}} \geq \frac{1}{2n}-\varepsilon$

for any ${\varepsilon>0}$ , but it has proven to be quite challenging to remove the factor of ${2}$ to increase ${\frac{1}{2n}}$ to ${\frac{1}{n+1}}$ . (As far as I know, even improving ${\frac{1}{2n}}$ to ${\frac{1+c}{2n}}$ for some absolute constant ${c>0}$ and sufficiently large ${n}$ remains open.)

The speeds ${v_1,\dots,v_n}$ in the above conjecture are arbitrary non-zero reals, but it has been known for some time that one can reduce without loss of generality to the case when the ${v_1,\dots,v_n}$ are rationals, or equivalently (by scaling) to the case where they are integers; see e.g. Section 4 of this paper of Bohman, Holzman, and Kleitman.

In this post I would like to remark on a slight refinement of this reduction, in which the speeds ${v_1,\dots,v_n}$ are integers of bounded size, where the bound depends on ${n}$ . More precisely:

Proposition 3 In order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that the ${v_1,\dots,v_n}$ are integers of size at most ${n^{Cn^2}}$ , where ${C}$ is an (explicitly computable) absolute constant. (More precisely: if this restricted version of the lonely runner conjecture is true for all ${n \leq n_0}$ , then the original version of the conjecture is also true for all ${n \leq n_0}$ .)

In principle, this proposition allows one to verify the lonely runner conjecture for a given ${n}$ in finite time; however the number of cases to check with this proposition grows faster than exponentially in ${n}$ , and so this is unfortunately not a feasible approach to verifying the lonely runner conjecture for more values of ${n}$ than currently known.

One of the key tools needed to prove this proposition is the following additive combinatorics result. Recall that a generalised arithmetic progression (or ${GAP}$ ) in the reals ${{\bf R}}$ is a set of the form

$\displaystyle P = \{ n_1 v_1 + \dots + n_d v_d: n_1,\dots,n_d \in {\bf Z}; |n_1| \leq N_1, \dots, |n_d| \leq N_d \}$

for some ${v_1,\dots,v_d \in {\bf R}}$ and ${N_1,\dots,N_d > 0}$ ; the quantity ${d}$ is called the rank of the progression. If ${t>0}$ , the progression ${P}$ is said to be ${t}$ -proper if the sums ${n_1 v_1 + \dots + n_d v_d}$ with ${|n_i| \leq t N_i}$ for ${i=1,\dots,d}$ are all distinct. We have

Lemma 4 (Progressions lie inside proper progressions) Let ${P}$ be a GAP of rank ${d}$ in the reals, and let ${t \geq 1}$ . Then ${P}$ is contained in a ${t}$ -proper GAP ${Q}$ of rank at most ${d}$ , with

$\displaystyle |Q| \leq (2t)^d d^{6d^2} \prod_{i=1}^d (2N_i+1).$

Proof: See Theorem 2.1 of this paper of Bilu. (Very similar results can also be found in Theorem 3.40 of my book with Van Vu, or Theorem 1.10 of this paper of mine with Van Vu.) $\Box$

Now let ${n \geq 1}$ , and assume inductively that the lonely runner conjecture has been proven for all smaller values of ${n}$ , as well as for the current value of ${n}$ in the case that ${v_1,\dots,v_n}$ are integers of size at most ${n^{Cn^2}}$ for some sufficiently large ${C}$ . We will show that the lonely runner conjecture holds in general for this choice of ${n}$ .

let ${v_1,\dots,v_n}$ be non-zero real numbers. Let ${C_0}$ be a large absolute constant to be chosen later. From the above lemma applied to the GAP ${\{ n_1 v_1 + \dots + n_d v_d: n_1,\dots,n_d \in \{-1,0,1\}\}}$ , one can find a ${n^{C_0n}}$ -proper GAP ${Q}$ of rank at most ${n}$ containing ${\{v_1,\dots,v_n\}}$ such that

$\displaystyle |Q| \leq (6n^{C_0 n})^n n^{6n^2};$

in particular ${|Q| \leq n^{Cn^2}}$ if ${C}$ is large enough depending on ${C_0}$ .

We write

$\displaystyle Q = \{ n_1 w_1 + \dots + n_d w_d: n_1,\dots,n_d \in {\bf Z}; |n_1| \leq N_1,\dots,|n_d| \leq N_d \}$

for some ${d \leq n}$ , ${w_1,\dots,w_d}$ , and ${N_1,\dots,N_d \geq 0}$ . We thus have ${v_i = \phi(a_i)}$ for ${i=1,\dots,n}$ , where ${\phi: {\bf R}^d \rightarrow {\bf R}}$ is the linear map ${\phi(n_1,\dots,n_d) := n_1 w_1 + \dots + n_d w_d}$ and ${a_1,\dots,a_n \in {\bf Z}^d}$ are non-zero and lie in the box ${\{ (n_1,\dots,n_d) \in {\bf R}^d: |n_1| \leq N_1,\dots,|n_d| \leq N_d \}}$ .

We now need an elementary lemma that allows us to create a “collision” between two of the ${a_1,\dots,a_n}$ via a linear projection, without making any of the ${a_i}$ collide with the origin:

Lemma 5 Let ${a_1,\dots,a_n \in {\bf R}^d}$ be non-zero vectors that are not all collinear with the origin. Then, after replacing one or more of the ${a_i}$ with their negatives ${-a_i}$ if necessary, there exists a pair ${a_i,a_j}$ such that ${a_i-a_j \neq 0}$ , and such that none of the ${a_1,\dots,a_n}$ is a scalar multiple of ${a_i-a_j}$ .

Proof: We may assume that ${d \geq 2}$ , since the ${d \leq 1}$ case is vacuous. Applying a generic linear projection to ${{\bf R}^2}$ (which does not affect collinearity, or the property that a given ${a_k}$ is a scalar multiple of ${a_i-a_j}$ ), we may then reduce to the case ${d=2}$ .

By a rotation and relabeling, we may assume that ${a_1}$ lies on the negative ${x}$ -axis; by flipping signs as necessary we may then assume that all of the ${a_2,\dots,a_n}$ lie in the closed right half-plane. As the ${a_i}$ are not all collinear with the origin, one of the ${a_i}$ lies off of the ${x}$ -axis, by relabeling, we may assume that ${a_2}$ lies off of the ${x}$ axis and makes a minimal angle with the ${x}$ -axis. Then the angle of ${a_2-a_1}$ with the ${x}$ -axis is non-zero but smaller than any non-zero angle that any of the ${a_i}$ make with this axis, and so none of the ${a_i}$ are a scalar multiple of ${a_2-a_1}$ , and the claim follows. $\Box$

We now return to the proof of the proposition. If the ${a_1,\dots,a_n}$ are all collinear with the origin, then ${\phi(a_1),\dots,\phi(a_n)}$ lie in a one-dimensional arithmetic progression ${\{ mv: |m| \leq |Q| \}}$ , and then by rescaling we may take the ${v_1,\dots,v_n}$ to be integers of magnitude at most ${|Q| \leq n^{Cn}}$ , at which point we are done by hypothesis. Thus, we may assume that the ${a_1,\dots,a_n}$ are not all collinear with the origin, and so by the above lemma and relabeling we may assume that ${a_n-a_1}$ is non-zero, and that none of the ${a_i}$ are scalar multiples of ${a_n-a_1}$ .

We write

$\displaystyle a_n-a_1 = (c_1,\dots,c_d) \ \ \ \ \ (1)$

with ${|c_i| \leq 2 N_i}$ for ${i=1,\dots,d}$ ; by relabeling we may assume without loss of generality that ${c_d}$ is non-zero, and furthermore that

$\displaystyle \frac{|c_i|}{N_i} \leq \frac{|c_d|}{N_d}$

for ${i=1,\dots,d}$ . We can also factor

$\displaystyle (c_1,\dots,c_d) = q (c'_1,\dots,c'_d) \ \ \ \ \ (2)$

where ${q}$ is a natural number and ${c'_1,\dots,c'_d}$ have no common factor.

We now define a variant ${\tilde \phi: {\bf R}^d \rightarrow {\bf R}}$ of ${\phi: {\bf R}^d \rightarrow {\bf R}}$ by the map

$\displaystyle \tilde \phi(n_1,\dots,n_d) := n_1 \tilde w_1 + \dots + n_{d-1} \tilde w_{d-1} - \frac{n_d}{c_d} (c_1 \tilde w_1 + \dots + c_{d-1} \tilde w_{d-1}),$

where the ${\tilde w_1,\dots,\tilde w_{d-1}}$ are real numbers that are linearly independent over ${{\bf Q}}$ , whose precise value will not be of importance in our argument. This is a linear map with the property that ${\tilde \phi(a_n-a_1)=0}$ , so that ${\tilde \phi(a_1),\dots,\tilde \phi(a_n)}$ consists of at most ${n-1}$ distinct real numbers, which are non-zero since none of the ${a_i}$ are scalar multiples of ${a_n-a_1}$ , and the ${\tilde w_i}$ are linearly independent over ${{\bf Q}}$ . As we are assuming inductively that the lonely runner conjecture holds for ${n-1}$ , we conclude (after deleting duplicates) that there exists at least one real number ${\tilde t}$ such that

$\displaystyle \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n}.$

We would like to “approximate” ${\phi}$ by ${\tilde \phi}$ to then conclude that there is at least one real number ${t}$ such that

$\displaystyle \| t \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| t \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}.$

It turns out that we can do this by a Fourier-analytic argument taking advantage of the ${n^{C_0 n}}$ -proper nature of ${Q}$ . Firstly, we see from the Dirichlet approximation theorem that one has

$\displaystyle \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \leq \frac{1}{10 n^2}$

for a set ${\tilde t}$ of reals of (Banach) density ${\gg n^{-O(n)}}$ . Thus, by the triangle inequality, we have

$\displaystyle \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n} - \frac{1}{10n^2}$

for a set ${\tilde t}$ of reals of density ${\gg n^{-O(n)}}$ .

Applying a smooth Fourier multiplier of Littlewood-Paley type, one can find a trigonometric polynomial

$\displaystyle \eta(x) = \sum_{m: |m| \leq n^{C_0 n/10}} b_m e^{2\pi i mx}$

which takes values in ${[0,1]}$ , is ${\gg 1}$ for ${\|x\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n} - \frac{1}{10n^2}}$ , and is no larger than ${O( n^{-100 C_0n} )}$ for ${\|x\|_{{\bf R}/{\bf Z}} \leq \frac{1}{n+1}}$ . We then have

$\displaystyle \mathop{\bf E}_t \prod_{j=1}^n \eta( t \tilde \phi(a_j) ) \gg n^{-O(n)}$

where ${\mathop{\bf E}_t f(t)}$ denotes the mean value of a quasiperiodic function ${f}$ on the reals ${{\bf R}}$ . We expand the left-hand side out as

$\displaystyle \sum_{m_1,\dots,m_n: m_1 \tilde \phi(a_1) + \dots + m_n \tilde \phi(a_n) = 0} b_{m_1} \dots b_{m_n}.$

From the genericity of ${\tilde w_1,\dots,\tilde w_n}$ , we see that the constraint

$\displaystyle m_1 \tilde \phi(a_1) + \dots + m_n \tilde \phi(a_n) = 0$

occurs if and only if ${m_1 a_1 + \dots + m_n a_n}$ is a scalar multiple of ${a_n-a_1}$ , or equivalently (by (1), (2)) an integer multiple of ${(c'_1,\dots,c'_d)}$ . Thus

$\displaystyle \sum_{m_1,\dots,m_n: m_1 a_1 + \dots + m_n a_n \in {\bf Z} (c'_1,\dots,c'_d)} b_{m_1} \dots b_{m_n} \gg n^{-O(n)}. \ \ \ \ \ (3)$

Next, we consider the average

$\displaystyle \mathop{\bf E}_t \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \ \ \ \ \ (4)$

where

$\displaystyle \xi := c'_1 w_1 + \dots + c'_d w_d. \ \ \ \ \ (5)$

and ${\varphi}$ is the Dirichlet series

$\displaystyle \varphi(x) := \sum_{m: |m| \leq n^{C_0 n/2}} e^{2\pi i mx}.$

By Fourier expansion and writing ${v_j = \phi(a_j)}$ , we may write (4) as

$\displaystyle \sum_{m,m_1,\dots,m_n: |m| \leq n^{C_0n/2}; m_1 \phi(a_1) + \dots + m_n \phi(a_n) = m \xi} b_{m_1} \dots b_{m_n}.$

The support of the ${b_{m_i}}$ implies that ${|m_i| \leq n^{C_0n/10}}$ . Because of the ${n^{C_0 n}}$ -properness of ${Q}$ , we see (for ${n}$ large enough) that the equation

$\displaystyle m_1 \phi(a_1) + \dots + m_n \phi(a_n) = m \xi \ \ \ \ \ (6)$

implies that

$\displaystyle m_1 a_1 + \dots + m_n a_n \in {\bf Z} (c'_1,\dots,c'_d) \ \ \ \ \ (7)$

and conversely that (7) implies that (6) holds for some ${m}$ with ${|m| \leq n^{C_0 n/2}}$ . From (3) we thus have

$\displaystyle \mathop{\bf E}_t \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \gg n^{-O(1)}.$

In particular, there exists a ${t}$ such that

$\displaystyle \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \gg n^{-O(1)}.$

Since ${\varphi}$ is bounded in magnitude by ${n^{C_0n/2}}$ , and ${\eta}$ is bounded by ${1}$ , we thus have

$\displaystyle \eta(t v_j) \gg n^{-C_0 n/2 - O(1)}$

for each ${1 \leq j \leq n}$ , which by the size properties of ${\eta}$ implies that ${\|tv_j\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}}$ for all ${1 \leq j \leq n}$ , giving the lonely runner conjecture for ${n}$ .

The standard branch of the matrix logarithm

3 May, 2015 in expository, math.GR, math.RA, math.SP | Tags: branch cuts, Lie groups, matrix logarithm, split orthogonal group | by Terence Tao | 24 comments

Because of Euler’s identity ${e^{\pi i} + 1 = 0}$ , the complex exponential is not injective: ${e^{z + 2\pi i k} = e^z}$ for any complex ${z}$ and integer ${k}$ . As such, the complex logarithm ${z \mapsto \log z}$ is not well-defined as a single-valued function from ${{\bf C} \backslash \{0\}}$ to ${{\bf C}}$ . However, after making a branch cut, one can create a branch of the logarithm which is single-valued. For instance, after removing the negative real axis ${(-\infty,0]}$ , one has the standard branch ${\hbox{Log}: {\bf C} \backslash (-\infty,0] \rightarrow \{ z \in {\bf C}: |\hbox{Im} z| < \pi \}}$ of the logarithm, with ${\hbox{Log}(z)}$ defined as the unique choice of the complex logarithm of ${z}$ whose imaginary part has magnitude strictly less than ${\pi}$ . This particular branch has a number of useful additional properties:

The standard branch ${\hbox{Log}}$ is holomorphic on its domain ${{\bf C} \backslash (-\infty,0]}$ .
One has ${\hbox{Log}( \overline{z} ) = \overline{ \hbox{Log}(z) }}$ for all ${z}$ in the domain ${{\bf C} \backslash (-\infty,0]}$ . In particular, if ${z \in {\bf C} \backslash (-\infty,0]}$ is real, then ${\hbox{Log} z}$ is real.
One has ${\hbox{Log}( z^{-1} ) = - \hbox{Log}(z)}$ for all ${z}$ in the domain ${{\bf C} \backslash (-\infty,0]}$ .

One can then also use the standard branch of the logarithm to create standard branches of other multi-valued functions, for instance creating a standard branch ${z \mapsto \exp( \frac{1}{2} \hbox{Log} z )}$ of the square root function. We caution however that the identity ${\hbox{Log}(zw) = \hbox{Log}(z) + \hbox{Log}(w)}$ can fail for the standard branch (or indeed for any branch of the logarithm).

One can extend this standard branch of the logarithm to ${n \times n}$ complex matrices, or (equivalently) to linear transformations ${T: V \rightarrow V}$ on an ${n}$ -dimensional complex vector space ${V}$ , provided that the spectrum of that matrix or transformation avoids the branch cut ${(-\infty,0]}$ . Indeed, from the spectral theorem one can decompose any such ${T: V \rightarrow V}$ as the direct sum of operators ${T_\lambda: V_\lambda \rightarrow V_\lambda}$ on the non-trivial generalised eigenspaces ${V_\lambda}$ of ${T}$ , where ${\lambda \in {\bf C} \backslash (-\infty,0]}$ ranges in the spectrum of ${T}$ . For each component ${T_\lambda}$ of ${T}$ , we define

$\displaystyle \hbox{Log}( T_\lambda ) = P_\lambda( T_\lambda )$

where ${P_\lambda}$ is the Taylor expansion of ${\hbox{Log}}$ at ${\lambda}$ ; as ${T_\lambda-\lambda}$ is nilpotent, only finitely many terms in this Taylor expansion are required. The logarithm ${\hbox{Log} T}$ is then defined as the direct sum of the ${\hbox{Log} T_\lambda}$ .

The matrix standard branch of the logarithm has many pleasant and easily verified properties (often inherited from their scalar counterparts), whenever ${T: V \rightarrow V}$ has no spectrum in ${(-\infty,0]}$ :

(i) We have ${\exp( \hbox{Log} T ) = T}$ .
(ii) If ${T_1: V_1 \rightarrow V_1}$ and ${T_2: V_2 \rightarrow V_2}$ have no spectrum in ${(-\infty,0]}$ , then ${\hbox{Log}( T_1 \oplus T_2 ) = \hbox{Log}(T_1) \oplus \hbox{Log}(T_2)}$ .
(iii) If ${T}$ has spectrum in a closed disk ${B(z,r)}$ in ${{\bf C} \backslash (-\infty,0]}$ , then ${\hbox{Log}(T) = P_z(T)}$ , where ${P_z}$ is the Taylor series of ${\hbox{Log}}$ around ${z}$ (which is absolutely convergent in ${B(z,r)}$ ).
(iv) ${\hbox{Log}(T)}$ depends holomorphically on ${T}$ . (Easily established from (ii), (iii), after covering the spectrum of ${T}$ by disjoint disks; alternatively, one can use the Cauchy integral representation ${\hbox{Log}(T) = \frac{1}{2\pi i} \int_\gamma \hbox{Log}(z)(z-T)^{-1}\ dz}$ for a contour ${\gamma}$ in the domain enclosing the spectrum of ${T}$ .) In particular, the standard branch of the matrix logarithm is smooth.
(v) If ${S: V \rightarrow W}$ is any invertible linear or antilinear map, then ${\hbox{Log}( STS^{-1} ) = S \hbox{Log}(T) S^{-1}}$ . In particular, the standard branch of the logarithm commutes with matrix conjugations; and if ${T}$ is real with respect to a complex conjugation operation on ${V}$ (that is to say, an antilinear involution), then ${\hbox{Log}(T)}$ is real also.
(vi) If ${T^*: V^* \rightarrow V^*}$ denotes the transpose of ${T}$ (with ${V^*}$ the complex dual of ${V}$ ), then ${\hbox{Log}(T^*) = \hbox{Log}(T)^*}$ . Similarly, if ${T^\dagger: V^\dagger \rightarrow V^\dagger}$ denotes the adjoint of ${T}$ (with ${V^\dagger}$ the complex conjugate of ${V^*}$ , i.e. ${V^*}$ with the conjugated multiplication map ${(c,z) \mapsto \overline{c} z}$ ), then ${\hbox{Log}(T^\dagger) = \hbox{Log}(T)^\dagger}$ .
(vii) One has ${\hbox{Log}(T^{-1}) = - \hbox{Log}( T )}$ .
(viii) If ${\sigma(T)}$ denotes the spectrum of ${T}$ , then ${\sigma(\hbox{Log} T) = \hbox{Log} \sigma(T)}$ .

As a quick application of the standard branch of the matrix logarithm, we have

Proposition 1 Let ${G}$ be one of the following matrix groups: ${GL_n({\bf C})}$ , ${GL_n({\bf R})}$ , ${U_n({\bf C})}$ , ${O(Q)}$ , ${Sp_{2n}({\bf C})}$ , or ${Sp_{2n}({\bf R})}$ , where ${Q: {\bf R}^n \rightarrow {\bf R}}$ is a non-degenerate real quadratic form (so ${O(Q)}$ is isomorphic to a (possibly indefinite) orthogonal group ${O(k,n-k)}$ for some ${0 \leq k \leq n}$ . Then any element ${T}$ of ${G}$ whose spectrum avoids ${(-\infty,0]}$ is exponential, that is to say ${T = \exp(X)}$ for some ${X}$ in the Lie algebra ${{\mathfrak g}}$ of ${G}$ .

Proof: We just prove this for ${G=O(Q)}$ , as the other cases are similar (or a bit simpler). If ${T \in O(Q)}$ , then (viewing ${T}$ as a complex-linear map on ${{\bf C}^n}$ , and using the complex bilinear form associated to ${Q}$ to identify ${{\bf C}^n}$ with its complex dual ${({\bf C}^n)^*}$ , then ${T}$ is real and ${T^{*-1} = T}$ . By the properties (v), (vi), (vii) of the standard branch of the matrix logarithm, we conclude that ${\hbox{Log} T}$ is real and ${- \hbox{Log}(T)^* = \hbox{Log}(T)}$ , and so ${\hbox{Log}(T)}$ lies in the Lie algebra ${{\mathfrak g} = {\mathfrak o}(Q)}$ , and the claim now follows from (i). $\Box$

Exercise 2 Show that ${\hbox{diag}(-\lambda, -1/\lambda)}$ is not exponential in ${GL_2({\bf R})}$ if ${-\lambda \in (-\infty,0) \backslash \{-1\}}$ . Thus we see that the branch cut in the above proposition is largely necessary. See this paper of Djokovic for a more complete description of the image of the exponential map in classical groups, as well as this previous blog post for some more discussion of the surjectivity (or lack thereof) of the exponential map in Lie groups.

For a slightly less quick application of the standard branch, we have the following result (recently worked out in the answers to this MathOverflow question):

Proposition 3 Let ${T}$ be an element of the split orthogonal group ${O(n,n)}$ which lies in the connected component of the identity. Then ${\hbox{det}(1+T) \geq 0}$ .

The requirement that ${T}$ lie in the identity component is necessary, as the counterexample ${T = \hbox{diag}(-\lambda, -1/\lambda )}$ for ${\lambda \in (-\infty,-1) \cup (-1,0)}$ shows.

Proof: We think of ${T}$ as a (real) linear transformation on ${{\bf C}^{2n}}$ , and write ${Q}$ for the quadratic form associated to ${O(n,n)}$ , so that ${O(n,n) \equiv O(Q)}$ . We can split ${{\bf C}^{2n} = V_1 \oplus V_2}$ , where ${V_1}$ is the sum of all the generalised eigenspaces corresponding to eigenvalues in ${(-\infty,0]}$ , and ${V_2}$ is the sum of all the remaining eigenspaces. Since ${T}$ and ${(-\infty,0]}$ are real, ${V_1,V_2}$ are real (i.e. complex-conjugation invariant) also. For ${i=1,2}$ , the restriction ${T_i: V_i \rightarrow V_i}$ of ${T}$ to ${V_i}$ then lies in ${O(Q_i)}$ , where ${Q_i}$ is the restriction of ${Q}$ to ${V_i}$ , and

$\displaystyle \hbox{det}(1+T) = \hbox{det}(1+T_1) \hbox{det}(1+T_2).$

The spectrum of ${T_2}$ consists of positive reals, as well as complex pairs ${\lambda, \overline{\lambda}}$ (with equal multiplicity), so ${\hbox{det}(1+T_2) > 0}$ . From the preceding proposition we have ${T_2 = \exp( X_2 )}$ for some ${X_2 \in {\mathfrak o}(Q_2)}$ ; this will be important later.

It remains to show that ${\hbox{det}(1+T_1) \geq 0}$ . If ${T_1}$ has spectrum at ${-1}$ then we are done, so we may assume that ${T_1}$ has spectrum only at ${(-\infty,-1) \cup (-1,0)}$ (being invertible, ${T}$ has no spectrum at ${0}$ ). We split ${V_1 = V_3 \oplus V_4}$ , where ${V_3,V_4}$ correspond to the portions of the spectrum in ${(-\infty,-1)}$ , ${(-1,0)}$ ; these are real, ${T}$ -invariant spaces. We observe that if ${V_\lambda, V_\mu}$ are generalised eigenspaces of ${T}$ with ${\lambda \mu \neq 1}$ , then ${V_\lambda, V_\mu}$ are orthogonal with respect to the (complex-bilinear) inner product ${\cdot}$ associated with ${Q}$ ; this is easiest to see first for the actual eigenspaces (since ${ \lambda \mu u \cdot v = Tu \cdot Tv = u \cdot v}$ for all ${u \in V_\lambda, v \in V_\mu}$ ), and the extension to generalised eigenvectors then follows from a routine induction. From this we see that ${V_1}$ is orthogonal to ${V_2}$ , and ${V_3}$ and ${V_4}$ are null spaces, which by the non-degeneracy of ${Q}$ (and hence of the restriction ${Q_1}$ of ${Q}$ to ${V_1}$ ) forces ${V_3}$ to have the same dimension as ${V_4}$ , indeed ${Q}$ now gives an identification of ${V_3^*}$ with ${V_4}$ . If we let ${T_3, T_4}$ be the restrictions of ${T}$ to ${V_3,V_4}$ , we thus identify ${T_4}$ with ${T_3^{*-1}}$ , since ${T}$ lies in ${O(Q)}$ ; in particular ${T_3}$ is invertible. Thus

$\displaystyle \hbox{det}(1+T_1) = \hbox{det}(1 + T_3) \hbox{det}( 1 + T_3^{*-1} ) = \hbox{det}(T_3)^{-1} \hbox{det}(1+T_3)^2$

and so it suffices to show that ${\hbox{det}(T_3) > 0}$ .

At this point we need to use the hypothesis that ${T}$ lies in the identity component of ${O(n,n)}$ . This implies (by a continuity argument) that the restriction of ${T}$ to any maximal-dimensional positive subspace has positive determinant (since such a restriction cannot be singular, as this would mean that ${T}$ positive norm vector would map to a non-positive norm vector). Now, as ${V_3,V_4}$ have equal dimension, ${Q_1}$ has a balanced signature, so ${Q_2}$ does also. Since ${T_2 = \exp(X_2)}$ , ${T_2}$ already lies in the identity component of ${O(Q_2)}$ , and so has positive determinant on any maximal-dimensional positive subspace of ${V_2}$ . We conclude that ${T_1}$ has positive determinant on any maximal-dimensional positive subspace of ${V_1}$ .

We choose a complex basis of ${V_3}$ , to identify ${V_3}$ with ${V_3^*}$ , which has already been identified with ${V_4}$ . (In coordinates, ${V_3,V_4}$ are now both of the form ${{\bf C}^m}$ , and ${Q( v \oplus w ) = v \cdot w}$ for ${v,w \in {\bf C}^m}$ .) Then ${\{ v \oplus v: v \in V_3 \}}$ becomes a maximal positive subspace of ${V_1}$ , and the restriction of ${T_1}$ to this subspace is conjugate to ${T_3 + T_3^{*-1}}$ , so that

$\displaystyle \hbox{det}( T_3 + T_3^{*-1} ) > 0.$

But since ${\hbox{det}( T_3 + T_3^{*-1} ) = \hbox{det}(T_3) \hbox{det}( 1 + T_3^{-1} T_3^{*-1} )}$ and ${ 1 + T_3^{-1} T_3^{*-1}}$ is positive definite, so ${\hbox{det}(T_3)>0}$ as required. $\Box$

	Jianfeng Li on An introduction to measure…
	Jianfeng Li on An introduction to measure…
	Jianfeng Li on An introduction to measure…
	Jianfeng Li on An introduction to measure…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on Dense sets of natural numbers…
	Anonymous on Dense sets of natural numbers…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on Dense sets of natural numbers…
	Fabrice Imparato on Dense sets of natural numbers…
	Anonymous on A computation-outsourced discu…
	Anonymous on Dense sets of natural numbers…
	Anonymous on 245A, Notes 3: Integration on…
	Samuel German on Analysis I

Monthly Archive

A differentiation identity

Cancellation for the multilinear Hilbert transform

Failure of the L^1 pointwise and maximal ergodic theorems for the free group

A remark on the lonely runner conjecture

The standard branch of the matrix logarithm

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

For commenters