You are currently browsing the monthly archive for June 2023.

Jon Bennett and I have just uploaded to the arXiv our paper “Adjoint Brascamp-Lieb inequalities“. In this paper, we observe that the family of multilinear inequalities known as the Brascamp-Lieb inequalities (or Holder-Brascamp-Lieb inequalities) admit an adjoint formulation, and explore the theory of these adjoint inequalities and some of their consequences.

To motivate matters let us review the classical theory of adjoints for linear operators. If one has a bounded linear operator {T: L^p(X) \rightarrow L^q(Y)} for some measure spaces {X,Y} and exponents {1 < p, q < \infty}, then one can define an adjoint linear operator {T^*: L^{q'}(Y) \rightarrow L^{p'}(X)} involving the dual exponents {\frac{1}{p}+\frac{1}{p'} = \frac{1}{q}+\frac{1}{q'} = 1}, obeying (formally at least) the duality relation

\displaystyle  \langle Tf, g \rangle = \langle f, T^* g \rangle \ \ \ \ \ (1)

for suitable test functions {f, g} on {X, Y} respectively. Using the dual characterization

\displaystyle  \|f\|_{L^{p'}(X)} = \sup_{g: \|g\|_{L^p(X)} \leq 1} |\langle f, g \rangle|

of {L^{p'}(X)} (and similarly for {L^{q'}(Y)}), one can show that {T^*} has the same operator norm as {T}.

There is a slightly different way to proceed using Hölder’s inequality. For sake of exposition let us make the simplifying assumption that {T} (and hence also {T^*}) maps non-negative functions to non-negative functions, and ignore issues of convergence or division by zero in the formal calculations below. Then for any reasonable function {g} on {Y}, we have

\displaystyle  \| T^* g \|_{L^{p'}(X)}^{p'} = \langle (T^* g)^{p'-1}, T^* g \rangle = \langle T (T^* g)^{p'-1}, g \rangle

\displaystyle  \leq \|T\|_{op} \|(T^* g)^{p'-1}\|_{L^p(X)} \|g\|_{L^{p'}(Y)}

\displaystyle  = \|T\|_{op} \|T^* g \|_{L^{p'}(X)}^{p'-1} \|g\|_{L^{p'}(Y)};

by (1) and Hölder; dividing out by {\|T^* g \|_{L^{p'}(X)}^{p'-1}} we obtain {\|T^*\|_{op} \leq \|T\|_{op}}, and a similar argument also recovers the reverse inequality.

The first argument also extends to some extent to multilinear operators. For instance if one has a bounded bilinear operator {B: L^p(X) \times L^q(Y) \rightarrow L^r(Z)} for {1 < p,q,r < \infty} then one can then define adjoint bilinear operators {B^{*1}: L^q(Y) \times L^{r'}(Z) \rightarrow L^{p'}(X)} and {B^{*2}: L^p(X) \times L^{r'}(Z) \rightarrow L^{q'}(Y)} obeying the relations

\displaystyle  \langle B(f, g),h \rangle = \langle B^{*1}(g,h), f \rangle = \langle B^{*2}(f,h), g \rangle

and with exactly the same operator norm as {B}. It is also possible, formally at least, to adapt the Hölder inequality argument to reach the same conclusion.

In this paper we observe that the Hölder inequality argument can be modified in the case of Brascamp-Lieb inequalities to obtain a different type of adjoint inequality. (Continuous) Brascamp-Lieb inequalities take the form

\displaystyle  \int_{{\bf R}^d} \prod_{i=1}^k f_i^{c_i} \circ B_i \leq \mathrm{BL}(\mathbf{B},\mathbf{c}) (\prod_{i=1}^k \int_{{\bf R}^{d_i}} f_i)^{c_i}

for various exponents {c_1,\dots,c_k} and surjective linear maps {B_i: {\bf R}^d \rightarrow {\bf R}^{d_i}}, where {f_i: {\bf R}^{d_i} \rightarrow {\bf R}} are arbitrary non-negative measurable functions and {\mathrm{BL}(\mathbf{B},\mathbf{c})} is the best constant for which this inequality holds for all such {f_i}. [There is also another inequality involving variances with respect to log-concave distributions that is also due to Brascamp and Lieb, but it is not related to the inequalities discussed here.] Well known examples of such inequalities include Hölder’s inequality and the sharp Young convolution inequality; another is the Loomis-Whitney inequality, the first non-trivial example of which is

\displaystyle  \int_{{\bf R}^3} f(y,z)^{1/2} g(x,z)^{1/2} h(x,y)^{1/2}

\displaystyle  \leq (\int_{{\bf R}^2} f)^{1/2} (\int_{{\bf R}^2} g)^{1/2} (\int_{{\bf R}^2} h)^{1/2} \ \ \ \ \ (2)

for all non-negative measurable {f,g,h: {\bf R}^2 \rightarrow {\bf R}}. There are also discrete analogues of these inequalities, in which the Euclidean spaces {{\bf R}^d, {\bf R}^{d_i}} are replaced by discrete abelian groups, and the surjective linear maps {B_i} are replaced by discrete homomorphisms.

The operation {f \mapsto f \circ B_i} of pulling back a function on {{\bf R}^{d_i}} by a linear map {B_i: {\bf R}^d \rightarrow {\bf R}^{d_i}} to create a function on {{\bf R}^d} has an adjoint pushforward map {(B_i)_*}, which takes a function on {{\bf R}^d} and basically integrates it on the fibers of {B_i} to obtain a “marginal distribution” on {{\bf R}^{d_i}} (possibly multiplied by a normalizing determinant factor). The adjoint Brascamp-Lieb inequalities that we obtain take the form

\displaystyle  \|f\|_{L^p({\bf R}^d)} \leq \mathrm{ABL}( \mathbf{B}, \mathbf{c}, \theta, p) \prod_{i=1}^k \|(B_i)_* f \|_{L^{p_i}({\bf R}^{d_i})}^{\theta_i}

for non-negative {f: {\bf R}^d \rightarrow {\bf R}} and various exponents {p, p_i, \theta_i}, where {\mathrm{ABL}( \mathbf{B}, \mathbf{c}, \theta, p)} is the optimal constant for which the above inequality holds for all such {f}; informally, such inequalities control the {L^p} norm of a non-negative function in terms of its marginals. It turns out that every Brascamp-Lieb inequality generates a family of adjoint Brascamp-Lieb inequalities (with the exponent {p} being less than or equal to {1}). For instance, the adjoints of the Loomis-Whitney inequality (2) are the inequalities

\displaystyle  \| f \|_{L^p({\bf R}^3)} \leq \| (B_1)_* f \|_{L^{p_1}({\bf R}^2)}^{\theta_1} \| (B_2)_* f \|_{L^{p_2}({\bf R}^2)}^{\theta_2} \| (B_3)_* f \|_{L^{p_3}({\bf R}^2)}^{\theta_3}

for all non-negative measurable {f: {\bf R}^3 \rightarrow {\bf R}}, all {\theta_1, \theta_2, \theta_3>0} summing to {1}, and all {0 < p \leq 1}, where the {p_i} exponents are defined by the formula

\displaystyle  \frac{1}{2} (1-\frac{1}{p}) = \theta_i (1-\frac{1}{p_i})

and the {(B_i)_* f:{\bf R}^2 \rightarrow {\bf R}} are the marginals of {f}:

\displaystyle  (B_1)_* f(y,z) := \int_{\bf R} f(x,y,z)\ dx

\displaystyle  (B_2)_* f(x,z) := \int_{\bf R} f(x,y,z)\ dy

\displaystyle  (B_3)_* f(x,y) := \int_{\bf R} f(x,y,z)\ dz.

One can derive these adjoint Brascamp-Lieb inequalities from their forward counterparts by a version of the Hölder inequality argument mentioned previously, in conjunction with the observation that the pushforward maps {(B_i)_*} are mass-preserving (i.e., they preserve the {L^1} norm on non-negative functions). Conversely, it turns out that the adjoint Brascamp-Lieb inequalities are only available when the forward Brascamp-Lieb inequalities are. In the discrete case the forward and adjoint Brascamp-Lieb constants are essentially identical, but in the continuous case they can (and often do) differ by up to a constant. Furthermore, whereas in the forward case there is a famous theorem of Lieb that asserts that the Brascamp-Lieb constants can be computed by optimizing over gaussian inputs, the same statement is only true up to constants in the adjoint case, and in fact in most cases the gaussians will fail to optimize the adjoint inequality. The situation appears to be complicated; roughly speaking, the adjoint inequalities only use a portion of the range of possible inputs of the forward Brascamp-Lieb inequality, and this portion often misses the gaussian inputs that would otherwise optimize the inequality.

We have located a modest number of applications of the adjoint Brascamp-Lieb inequality (but hope that there will be more in the future):

  • The inequalities become equalities at {p=1}; taking a derivative at this value (in the spirit of the replica trick in physics) we recover the entropic Brascamp-Lieb inequalities of Carlen and Cordero-Erausquin. For instance, the derivative of the adjoint Loomis-Whitney inequalities at {p=1} yields Shearer’s inequality.
  • The adjoint Loomis-Whitney inequalities, together with a few more applications of Hölder’s inequality, implies the log-concavity of the Gowers uniformity norms on non-negative functions, which was previously observed by Shkredov and by Manners.
  • Averaging the adjoint Loomis-Whitney inequalities over coordinate systems gives reverse {L^p} inequalities for the X-ray transform and other tomographic transforms that appear to be new in the literature. In particular, we obtain some monotonicity of the {L^{p_k}} norms or entropies of the {k}-plane transform in {k} (if the exponents {p_k} are chosen in a dimensionally consistent fashion).

We also record a number of variants of the adjoint Brascamp-Lieb inequalities, including discrete variants, and a reverse inequality involving {L^p} norms with {p>1} rather than {p<1}.

Ben Green, Freddie Manners and I have just uploaded to the arXiv our preprint “Sumsets and entropy revisited“. This paper uses entropy methods to attack the Polynomial Freiman-Ruzsa (PFR) conjecture, which we study in the following two forms:

Conjecture 1 (Weak PFR over {Z}) Let {A \subset {\bf Z}^D} be a finite non-empty set whose doubling constant {\sigma[A] := |A+A|/|A|} is at most {K}. Then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} that has affine dimension {O(\log K)} (i.e., it is contained in an affine space of dimension {O(\log K)}).

Conjecture 2 (PFR over {{\bf F}_2}) Let {A \subset {\bf F}_2^D} be a non-empty set whose doubling constant {\sigma[A]} is at most {K}. Then {A} can be covered by {O(K^{O(1)})} cosets of a subspace of cardinality at most {|A|}.

Our main results are then as follows.

Theorem 3 If {A \subset {\bf Z}^D} with {\sigma[A] \leq K}, then
  • (i) There is a subset {A'} of {A} of density {\gg K^{-O(1)}} of “skew-dimension” (or “query complexity”) {O(\log K)}.
  • (ii) There is a subset {A'} of {A} of density {\gg \exp( -O(\log^{5/3+o(1)} K) )} of affine dimension {O(\log K)} (where {o(1)} goes to zero as {K \rightarrow \infty}).
  • (iii) If Conjecture 2 holds, then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} of affine dimension {O(\log K)}. In other words, Conjecture 2 implies Conjecture 1.

The skew-dimension of a set is a quantity smaller than the affine dimension which is defined recursively; the precise definition is given in the paper, but suffice to say that singleton sets have dimension {0}, and a set {A \subset {\bf Z}^{D_1} \times {\bf Z}^{D_2}} whose projection to {{\bf Z}^{D_1}} has skew-dimension at most {d_1}, and whose fibers in {\{x\} \times {\bf Z}^{D_2}} have skew-dimension at most {d_2} for any {x}, will have skew-dimension at most {d_1+d_2}. (In fact, the skew-dimension is basically the largest quantity which obeys all of these properties.)

Part (i) of this theorem was implicitly proven by Pálvölgi and Zhelezov by a different method. Part (ii) with {5/3+o(1)} replaced by {2} was established by Manners. To our knowledge, part (iii) is completely new.

Our proof strategy is to establish these combinatorial additive combinatorics results by using entropic additive combinatorics, in which we replace sets {A} with random variables {X}, and cardinality with (the exponential of) Shannon entropy. This is in order to take advantage of some superior features of entropic additive combinatorics, most notably good behavior with respect to homomorphisms.

For instance, the analogue of the combinatorial doubling constant {\sigma[A] := |A+A|/|A|} of a finite non-empty subset {A} of an abelian group {G}, is the entropy doubling constant

\displaystyle  \sigma_{\mathrm{ent}}[X] := {\exp( \bf H}(X_1+X_2) - {\bf H}(X) )

of a finitely-valued random variable {X} in {G}, where {X_1,X_2} are independent copies of {X} and {{\bf H}} denotes Shannon entropy. There is also an analogue of the Ruzsa distance

\displaystyle  d(A,B) := \log \frac{|A-B|}{|A|^{1/2} |B|^{1/2}}

between two finite non-empty subsets {A,B} of {G}, namely the entropic Ruzsa distance

\displaystyle  d_{\mathrm{ent}}(X,Y) := {\bf H}(X'+Y') - \frac{1}{2} {\bf H}(X) - \frac{1}{2} {\bf H}(Y)

where {X',Y'} are independent copies of {X,Y} respectively. (Actually, one thing we show in our paper is that the independence hypothesis can be dropped, and this only affects the entropic Ruzsa distance by a factor of three at worst.) Many of the results about sumsets and Ruzsa distance have entropic analogues, but the entropic versions are slightly better behaved; for instance, we have a contraction property

\displaystyle  d_{\mathrm{ent}}(\phi(X),\phi(Y)) \leq d_{\mathrm{ent}}(X,Y)

whenever {\phi: G \rightarrow H} is a homomorphism. In fact we have a refinement of this inequality in which the gap between these two quantities can be used to control the entropic distance between “fibers” of {X,Y} (in which one conditions {\phi(X)} and {\phi(Y)} to be fixed). On the other hand, there are direct connections between the combinatorial and entropic sumset quantities. For instance, if {U_A} is a random variable drawn uniformly from {A}, then

\displaystyle  \sigma_{\mathrm{ent}}[U_A] \leq \sigma[A].

Thus if {A} has small doubling, then {U_A} has small entropic doubling. In the converse direction, if {X} has small entropic doubling, then {X} is close (in entropic Ruzsa distance) to a uniform random variable {U_S} drawn from a set {S} of small doubling; a version of this statement was proven in an old paper of myself, but we establish here a quantitatively efficient version, established by rewriting the entropic Ruzsa distance in terms of certain Kullback-Liebler divergences.

Our first main result is a “99% inverse theorem” for entropic Ruzsa distance: if {d_{\mathrm{ent}}(X,Y)} is sufficiently small, then there exists a finite subgroup {H} of {G} such that

\displaystyle  d_{\mathrm{ent}}(X,U_H), d_{\mathrm{ent}}(Y,U_H) \leq 12 d_{\mathrm{ent}}(X,Y). \ \ \ \ \ (1)

This result uses the results just mentioned to relate {X,Y} to a set {S} of small doubling, which can then be related to a subgroup {H} by standard inverse theorems; this gives a weak version of (1) (roughly speaking losing a square root in the bound), and some additional analysis is needed to bootstrap this initial estimate back to (1).

We now sketch how these tools are used to prove our main theorem. For (i), we reduce matters to establishing the following bilinear entropic analogue: given two non-empty finite subsets {A,B} of {{\bf Z}^D}, one can find subsets {A' \subset A}, {B' \subset B} with

\displaystyle  |A'| |B'| \geq e^{-C d_{\mathrm{ent}}(U_A, U_B)} |A| |B|

such that {A', B'} have skew-dimension at most {C d_{\mathrm{ent}}(U_A, U_B)}, for some absolute constant {C}. This can be shown by an induction on {|A||B|} (say). One applies a non-trivial coordinate projection {\pi: {\bf Z}^D \rightarrow {\bf Z}} to {A,B}. If {\pi(U_A)} and {\pi(U_B)} are very close in entropic Ruzsa distance, then the 99% inverse theorem shows that these random variables must each concentrate at a point (because {{\bf Z}} has no non-trivial finite subgroups), and can pass to a fiber of these points and use the induction hypothesis. If instead {\pi(U_A)} and {\pi(U_B)} are far apart, then by the behavior of entropy under projections one can show that the fibers of {A} and {B} under {\pi} are closer on average in entropic Ruzsa distance of {A} and {B} themselves, and one can again proceed using the induction hypothesis.

For parts (ii) and (iii), we first use an entropic version of an observation of Manners that sets of small doubling in {{\bf Z}^D} must be irregularly distributed modulo {2}. A clean formulation of this in entropic language is the inequality

\displaystyle  d_{\mathrm{ent}}(X, 2Y) \leq 5 d_{\mathrm{ent}}(X,Y)

whenever {X,Y} take values in a torsion-free abelian group such as {{\bf Z}^D}; this turns out to follow from two applications of the entropy submodularity inequality. One corollary of this (and the behavior of entropy under projections) is that

\displaystyle  {\bf H}( X \hbox{ mod } 2 ), {\bf H}( Y \hbox{ mod } 2 ) \leq 10 d_{\mathrm{ent}}(X,Y).

This is the key link between the {{\bf Z}^D} and {{\bf F}_2^D} worlds that is used to prove (ii), (iii): while (iii) relies on the still unproven PFR conjecture over {{\bf F}_2}, (ii) uses the unconditional progress on PFR by Konyagin, as detailed in this survey of Sanders. The argument has a similar inductive structure to that used to establish (i) (and if one is willing to replace {5/3+o(1)} by {2} then the argument is in fact relatively straightforward and does not need any deep partial results on the PFR).

As one byproduct of our analysis we also obtain an appealing entropic reformulation of Conjecture 2, namely that if {X} is an {{\bf F}_2^D}-valued random variable then there exists a subspace {H} of {{\bf F}_2^D} such that

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll \sigma_{\mathrm{ent}}[X].

Right now the best result in this direction is

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll_\varepsilon \sigma_{\mathrm{ent}}[X] + \sigma_{\mathrm{ent}}^{3+\varepsilon}[X]

for any {\varepsilon > 0}, by using Konyagin’s partial result towards the PFR.

The Salem prize was established in 1968 and named in honor of Raphaël Salem (1898-1963), a mathematician famous notably for his deep study of the links between Fourier series and number theory and for pioneering applications of probabilistic methods to these fields. It was largely administered by Jean Bourgain until his untimely death in 2018; due to this and the COVID-19 pandemic, no prize was awarded for the years of 2019-2022. A list of past winners may be found here.

However, I am happy to report that the prize has been reactivated, and is now formally hosted by the Institute for Advanced Study, with Akshay Venkatesh overseeing the administration of the prize, and is accepting nominations for the 2023 Salem Prize until September 1st. Nominations should include a CV of the nominee and a nomination letter explaining the significance of the nominee’s work. Supplementary documentation, such as supporting letters of recommendation or key publications, can additionally be provided, but are not required.

Nominees may be individuals from any country or institution. Preference will be given to nominees who have received their PhD in the last ten years, although this rule may be relaxed if there are mitigating personal circumstances, or if there have been few Salem prize winners in recent years.  Self-nominations will not be considered, nor are past Prize winners or Scientific Committee members eligible.

The prize does not come with a direct monetary award, but winners will be invited to visit the IAS and to give a lecture associated with the award of the prize.

I will be chairing the Scientific Committee to evaluate the nominations and recommend a prize winner; Guy David and Mikhail Sodin have also agreed to serve. Special thanks also to Peter Sarnak for his tireless efforts to ensure the continuation of the prize after Jean’s passing.

Hariharan Narayanan, Scott Sheffield, and I have just uploaded to the arXiv our paper “Sums of GUE matrices and concentration of hives from correlation decay of eigengaps“. This is a personally satisfying paper for me, as it connects the work I did as a graduate student (with Allen Knutson and Chris Woodward) on sums of Hermitian matrices, with more recent work I did (with Van Vu) on random matrix theory, as well as several other results by other authors scattered across various mathematical subfields.

Suppose {A, B} are two {n \times n} Hermitian matrices with eigenvalues {\lambda = (\lambda_1,\dots,\lambda_n)} and {\mu = (\mu_1,\dots,\mu_n)} respectively (arranged in non-increasing order. What can one say about the eigenvalues {\nu = (\nu_1,\dots,\nu_n)} of the sum {A+B}? There are now many ways to answer this question precisely; one of them, introduced by Allen and myself many years ago, is that there exists a certain triangular array of numbers called a “hive” that has {\lambda, \mu, \nu} as its boundary values. On the other hand, by the pioneering work of Voiculescu in free probability, we know in the large {n} limit that if {\lambda, \mu} are asymptotically drawn from some limiting distribution, and {A} and {B} are drawn independently at random (using the unitarily invariant Haar measure) amongst all Hermitian matrices with the indicated eigenvalues, then (under mild hypotheses on the distribution, and under suitable normalization), {\nu} will almost surely have a limiting distribution that is the free convolution of the two original distributions.

One of my favourite open problems is to come up with a theory of “free hives” that allows one to explain the latter fact from the former. This is still unresolved, but we are now beginning to make a bit of progress towards this goal. We know (for instance from the calculations of Coquereaux and Zuber) that if {A, B} are drawn independently at random with eigenvalues {\lambda, \mu}, then the eigenvalues {\nu} of {A+B} are distributed according to the boundary values of an “augmented hive” with two boundaries {\lambda,\mu}, drawn uniformly at random from the polytope of all such augmented hives. (This augmented hive is basically a regular hive with another type of pattern, namely a Gelfand-Tsetlin pattern, glued to one side of it.) So, if one could show some sort of concentration of measure for the entries of this augmented hive, and calculate what these entries concentrated to, one should presumably be able to recover Voiculescu’s result after some calculation.

In this paper, we are able to accomplish the first half of this goal, assuming that the spectra {\lambda, \mu} are not deterministic, but rather drawn from the spectra of rescaled GUE matrices (thus {A,B} are independent rescaled copies of the GUE ensemble). We have chosen to normalize matters so that the eigenvalues {\lambda,\mu} have size {O(n)}, so that the entries of the augmented hive have entries {O(n^2)}. Our result is then that the entries of the augmented hive in fact have a standard deviation of {o(n^2)}, thus exhibiting a little bit of concentration. (Actually, from the Brunn-Minkowski inequality, the distribution of these entries is log concave, so once once controls the standard deviation one also gets a bit of exponential decay beyond the standard deviation; Narayanan and Sheffield had also recently established the existence of a rate function for this sort of model.) Presumably one should get much better concentration, and one should be able to handle other models than the GUE ensemble, but this is the first advance that we were able to achieve.

Augmented hives seem tricky to work with directly, but by adapting the octahedron recurrence introduced for this problem by Knutson, Woodward, and myself some time ago (which is related to the associativity {(A+B)+C = A+(B+C)} of addition for Hermitian matrices), one can construct a piecewise linear volume-preserving map between the cone of augmented hives, and the product of two Gelfand-Tsetlin cones. The problem then reduces to establishing concentration of measure for certain piecewise linear maps on products of Gelfand-Tsetlin cones (endowed with a certain GUE-type measure). This is a promising formulation because Gelfand-Tsetlin cones are by now quite well understood.

On the other hand, the piecewise linear map, initially defined by iterating the octahedron relation {f = \max(a+c,b+d)-e}, looks somewhat daunting. Fortunately, there is an explicit formulation of this map due to Speyer, as the supremum of certain linear maps associated to perfect matchings of a certain “excavation graph”. For us it was convenient to work with the dual of this excavation graph, and associate these linear maps to certain “lozenge tilings” of a hexagon.

It would be more convenient to study the concentration of each linear map separately, rather than their supremum. By the Cheeger inequality, it turns out that one can relate the latter to the former provided that one has good control on the Cheeger constant of the underlying measure on the Gelfand-Tsetlin cones. Fortunately, the measure is log-concave, so one can use the very recent work of Klartag on the KLS conjecture to eliminate the supremum (up to a logarithmic loss which is only moderately annoying to deal with).

It remains to obtain concentration on the linear map associated to a given lozenge tiling. After stripping away some contributions coming from lozenges near the edge (using some eigenvalue rigidity results of Van Vu and myself), one is left with some bulk contributions which ultimately involve eigenvalue interlacing gaps such as

\displaystyle  \lambda_i - \lambda_{n-1,i}

where {\lambda_{n-1,i}} is the {i^{th}} eigenvalue of the top left {n-1 \times n-1} minor of {A}, and {i} is in the bulk region {\varepsilon n \leq i \leq (1-\varepsilon) n} for some fixed {\varepsilon > 0}. To get the desired result, one needs some non-trivial correlation decay in {i} for these statistics. If one was working with eigenvalue gaps {\lambda_i - \lambda_{i+1}} rather than interlacing results, then such correlation decay was conveniently obtained for us by recent work of Cippoloni, Erdös, and Schröder. So the last remaining challenge is to understand the relation between eigenvalue gaps and interlacing gaps.

For this we turned to the work of Metcalfe, who uncovered a determinantal process structure to this problem, with a kernel associated to Lagrange interpolation polynomials. It is possible to satisfactorily estimate various integrals of these kernels using the residue theorem and eigenvalue rigidity estimates, thus completing the required analysis.

Back in March, I was approached to contribute to a then-upcoming anthology project to evaluate an early access version of the GPT-4 large language model, and write a short essay about my experiences. Our prompt was to focus on two core questions:

  • How might this technology and its successors contribute to human flourishing?
  • How might we as society best guide the technology to achieve maximal benefits for humanity?

The anthology is now in the process of being rolled out, with twelve of the twenty essays, including mine, public at this time of writing.

As an experiment, I also asked GPT-4 itself to contribute an essay to the anthology from the same prompts (and playing the role of a research mathematician), then I gave it my own essay (which I wrote independently) and asked it both to rewrite its own essay in the style of my own, or to copyedit my essay into what it deemed to be a better form. I recorded the results of those experiments here; the output was reasonably well written and on topic, but not exceptional in content.

Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “An upper bound on the mean value of the Erdős-Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Erdos-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \Delta(n;u)

where

\displaystyle  \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.

The function {\Delta} measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, {e}-dyadic) interval {(e^u, e^{u+1}]}. From the pigeonhole principle, we have the bounds

\displaystyle  \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),

where {\tau(n) := \# \{ d: d|n\}} is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if {n} is drawn at random from {1} to {x}, then the mean value of {\tau(n)} is roughly {\log x}, the median is roughly {\log^{\log 2} x}, and (by the Erdős-Kac theorem) {\tau(n)} asymptotically has a log-normal distribution. In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Erdős-Hooley delta function is significantly less well understood, even conjecturally. Again drawing {n} at random from {1} to {x} for large {x}, the median is known to be somewhere between {(\log\log x)^{0.3533\dots}} and {(\log\log x)^{0.6102\dots}} for large {x} – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and La Bretèche and Tenenbaum (for the upper bound). And the mean {\frac{1}{x} \sum_{n \leq x} \Delta(n)} was even less well controlled; the best previous bounds were

\displaystyle  \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )

for any {c > \sqrt{2} \log 2}, with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of La Bretèche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no non-empty subset of {A} sums to a perfect square? Erdos observed that one can obtain sets of size {\gg N^{1/3}} (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of {\ll N^{1/3} (\log N)^{O(1)}}. With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to {\ll N^{1/3} (\log\log N)^{O(1)})}.

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number {n} factors as {n = d^2 m} with {m} squarefree, then {\Delta(n) \leq \tau(d^2) \Delta(m)}). Next, because a square-free number {n>1} can be uniquely factored as {n = pm} where {p} is a prime and {m} lies in the finite set {{\mathcal S}_{<p}} of squarefree numbers whose prime factors are less than {p}, and {\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}, it is not difficult to establish the bound

\displaystyle  \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \frac{\Delta(n)}{n}.

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)

We actually prove a slightly more refined distributional estimate: for any {A \geq 2}, we have a bound

\displaystyle  \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)

outside of an exceptional set {E} which is small in the sense that

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent {3/4} to {3/4+2 = 11/4}).

To get some intuition on the size of {\Delta(n)}, we observe that if {y > 0} and {n_{<y}} is the factor of {n} coming from the prime factors less than {y}, then

\displaystyle  \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)

On the other hand, standard estimates let one establish that

\displaystyle  \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)

for all {y}, and all {n} outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless {\log y} is close to {A^{\log 4}}, which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum {\Delta(n) = \sup_u \Delta(n;u)} by the moments

\displaystyle  M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du

for natural numbers {q}; it is not difficult to establish the bound

\displaystyle  \Delta(n) \ll M_q(n)^{1/q}

and one expects this bound to become essentially sharp once {q \sim \log\log x}. We will be able to show a moment bound

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A

for any {q \geq 2} for some exceptional set {E_q} obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on {q}); this will imply the distributional bound (2) from a standard Markov inequality argument (setting {q \sim \log\log x}).

The strategy is then to obtain a good recursive inequality for (averages of) {M_q(n)}. As in the reduction to (1), we factor {n=pm} where {p} is a prime and {m \in {\mathcal S}_{<p}}. One observes the identity

\displaystyle  \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)

for any {u}; taking moments, one obtains the identity

\displaystyle  M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

As in previous literature, one can try to average in {p} here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in {a,b} to reduce to the case of relatively small values of {b}:

\displaystyle  M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

One can extract out the {b=0} term as

\displaystyle  M_q(n) \leq 2 M_q(m)

\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

It is convenient to eliminate the factor of {2} by dividing out by the divisor function:

\displaystyle  \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}

\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

This inequality is suitable for iterating and also averaging in {p} and {m}. After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)

in terms of sums such as

\displaystyle  \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}

(assuming a certain monotonicity property of the exceptional set {E_q} that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on {M_b} (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}

for various {a < q}. This allows one to estimate these expressions efficiently by induction.

The “epsilon-delta” nature of analysis can be daunting and unintuitive to students, as the heavy reliance on inequalities rather than equalities. But it occurred to me recently that one might be able to leverage the intuition one already has from “deals” – of the type one often sees advertised by corporations – to get at least some informal understanding of these concepts.

Take for instance the concept of an upper bound {X \leq A} or a lower bound {X \geq B} on some quantity {X}. From an economic perspective, one could think of the upper bound as an assertion that {X} can be “bought” for {A} units of currency, and the lower bound can similarly be viewed as an assertion that {X} can be “sold” for {B} units of currency. Thus for instance, a system of inequalities and equations like

\displaystyle  2 \leq Y \leq 5

\displaystyle  X+Y \leq 7

\displaystyle  X+Y+Z = 10

\displaystyle  Y+Z \leq 6

could be viewed as analogous to a currency rate exchange board, of the type one sees for instance in airports:

Currency We buy at We sell at
{Y} {2} {5}
{X+Y} {7}
{X+Y+Z} {10} {10}
{Y+Z} {6}

Someone with an eye for spotting “deals” might now realize that one can actually buy {Y} for {3} units of currency rather than {5}, by purchasing one copy each of {X+Y} and {Y+Z} for {7+6=13} units of currency, then selling off {X+Y+Z} to recover {10} units of currency back. In more traditional mathematical language, one can improve the upper bound {Y \leq 5} to {Y \leq 3} by taking the appropriate linear combination of the inequalities {X+Y \leq 7}, {Y+Z \leq 6}, and {X+Y+Z=10}. More generally, this way of thinking is useful when faced with a linear programming situation (and of course linear programming is a key foundation for operations research), although this analogy begins to break down when one wants to use inequalities in a more non-linear fashion.

Asymptotic estimates such as {X = O(Y)} (also often written {X \lesssim Y} or {X \ll Y}) can be viewed as some sort of liquid market in which {Y} can be used to purchase {X}, though depending on market rates, one may need a large number of units of {Y} in order to buy a single unit of {X}. An asymptotic estimate like {X=o(Y)} represents an economic situation in which {Y} is so much more highly desired than {X} that, if one is a patient enough haggler, one can eventually convince someone to give up a unit of {X} for even just a tiny amount of {Y}.

When it comes to the basic analysis concepts of convergence and continuity, one can similarly view these concepts as various economic transactions involving the buying and selling of accuracy. One could for instance imagine the following hypothetical range of products in which one would need to spend more money to obtain higher accuracy to measure weight in grams:

Object Accuracy Price
Low-end kitchen scale {\pm 1} gram {\$ 5}
High-end bathroom scale {\pm 0.1} grams {\$ 15}
Low-end lab scale {\pm 0.01} grams {\$ 50}
High-end lab scale {\pm 0.001} grams {\$ 250}

The concept of convergence {x_n \rightarrow x} of a sequence {x_1,x_2,x_3,\dots} to a limit {x} could then be viewed as somewhat analogous to a rewards program, of the type offered for instance by airlines, in which various tiers of perks are offered when one hits a certain level of “currency” (e.g., frequent flyer miles). For instance, the convergence of the sequence {x_n := 2 + \frac{1}{\sqrt{n}}} to its limit {x := 2} offers the following accuracy “perks” depending on one’s level {n} in the sequence:

Status Accuracy benefit Eligibility
Basic status {|x_n - x| \leq 1} {n \geq 1}
Bronze status {|x_n - x| \leq 0.1} {n \geq 10^2}
Silver status {|x_n - x| \leq 0.01} {n \geq 10^4}
Gold status {|x_n - x| \leq 0.001} {n \geq 10^6}
{\dots} {\dots} {\dots}

With this conceptual model, convergence means that any status level of accuracy can be unlocked if one’s number {n} of “points earned” is high enough.

In a similar vein, continuity becomes analogous to a conversion program, in which accuracy benefits from one company can be traded in for new accuracy benefits in another company. For instance, the continuity of the function {f(x) = 2 + \sqrt{x}} at the point {x_0=0} can be viewed in terms of the following conversion chart:

Accuracy benefit of {x} to trade in Accuracy benefit of {f(x)} obtained
{|x - x_0| \leq 1} {|f(x) - f(x_0)| \leq 1}
{|x - x_0| \leq 0.01} {|f(x) - f(x_0)| \leq 0.1}
{|x - x_0| \leq 0.0001} {|f(x) - f(x_0)| \leq 0.01}
{\dots} {\dots}

Again, the point is that one can purchase any desired level of accuracy of {f(x)} provided one trades in a suitably high level of accuracy of {x}.

At present, the above conversion chart is only available at the single location {x_0}. The concept of uniform continuity can then be viewed as an advertising copy that “offer prices are valid in all store locations”. In a similar vein, the concept of equicontinuity for a class {{\mathcal F}} of functions is a guarantee that “offer applies to all functions {f} in the class {{\mathcal F}}, without any price discrimination. The combined notion of uniform equicontinuity is then of course the claim that the offer is valid in all locations and for all functions.

In a similar vein, differentiability can be viewed as a deal in which one can trade in accuracy of the input for approximately linear behavior of the output; to oversimplify slightly, smoothness can similarly be viewed as a deal in which one trades in accuracy of the input for high-accuracy polynomial approximability of the output. Measurability of a set or function can be viewed as a deal in which one trades in a level of resolution for an accurate approximation of that set or function at the given resolution. And so forth.

Perhaps readers can propose some other examples of mathematical concepts being re-interpreted as some sort of economic transaction?

The National Academies of Science, Engineering, and Mathematics are hosting a virtual workshop on the topic of “AI to Assist Mathematical Reasoning” from June 12-14. The tentative program can be found here. I am one of the members of the organizing committee for this workshop, together with Petros Koumoutsakos, Jordan Ellenberg, Melvin Greer, Brendan Hassett, Yann A. LeCun, Heather Macbeth, Talia Ringer, Kavitha Srinivas, and Michelle Schwalbe. There is some thematic overlap (and a few speakers in common) with the recent IPAM program on machine assisted proof, though with more of a focus on the current and projected technical capabilities of machine learning algorithms for mathematics. Registration for the event is currently open at the web page for the workshop.

Archives