You are currently browsing Terence Tao’s articles.
I’ve just uploaded to the arXiv my paper “Dense sets of natural numbers with unusually large least common multiples“. This short paper answers (in the negative) a somewhat obscure question of Erdös and Graham:
Problem 1 Is it true that ifis a set of natural numbers for which
goes to infinity as
, then the quantity
also goes to infinity as
?
At first glance, this problem may seem rather arbitrary, but it can be motivated as follows. The hypothesis that (1) goes to infinity is a largeness condition on ; in view of Mertens’ theorem, it can be viewed as an assertion that
is denser than the set of primes. On the other hand, the conclusion that (2) grows is an assertion that
becomes significantly larger than
on the average for large
; that is to say, that many pairs of numbers in
share a common factor. Intuitively, the problem is then asking whether sets that are significantly denser than the primes must start having lots of common factors on average.
For sake of comparison, it is easy to see that if (1) goes to infinity, then at least one pair of distinct elements in
must have a non-trivial common factor. For if this were not the case, then the elements of
are pairwise coprime, so each prime
has at most one multiple in
, and so can contribute at most
to the sum in (1), and hence by Mertens’ theorem, and the fact that every natural number greater than one is divisible by at least one prime
, the quantity (1) stays bounded, a contradiction.
It turns out, though, that the answer to the above problem is negative; one can find sets that are denser than the primes, but for which (2) stays bounded, so that the least common multiples in the set are unusually large. It was a bit surprising to me that this question had not been resolved long ago (in fact, I was not able to find any prior literature on the problem beyond the original reference of Erdös and Graham); in contrast, another problem of Erdös and Graham concerning sets with unusually small least common multiples was extensively studied (and essentially solved) about twenty years ago, while the study of sets with unusually large greatest common divisor for many pairs in the set has recently become somewhat popular, due to their role in the proof of the Duffin-Schaeffer conjecture by Koukoulopoulos and Maynard.
To search for counterexamples, it is natural to look for numbers with relatively few prime factors, in order to reduce their common factors and increase their least common multiple. A particularly simple example, whose verification is on the level of an exercise in a graduate analytic number theory course, is the set of semiprimes (products of two primes), for which one can readily verify that (1) grows like but (2) stays bounded. With a bit more effort, I was able to optimize the construction and uncover the true threshold for boundedness of (2), which was a little unexpected:
Theorem 2
The proofs are not particularly long or deep, but I thought I would record here some of the process towards finding them. My first step was to try to simplify the condition that (2) stays bounded. In order to use probabilistic intuition, I first expressed this condition in probabilistic terms as
It is then natural to try a random construction, in which one sieves out the natural numbers by permitting each natural number to survive with a probability resembling
, in order to get the predicted behavior for
. Performing some standard calculations, this construction could ensure (2) bounded with a density a little bit less than the one stated in the main theorem; after optimizing the parameters, I could only get something like
It then remained to improve the lower bound construction to eliminate the losses in the exponents. By deconstructing the proof of the upper bound, it became natural to consider something like the set of natural numbers
that had at most
prime factors. This construction actually worked for some scales
– namely those
for which
was a natural number – but there was some strange “discontinuities” in the analysis that prevented me from establishing the boundedness of (2) for arbitrary scales
. The basic problem was that increasing the number of permitted prime factors from one natural number threshold
to another
ended up increasing the density of the set by an unbounded factor (of the order of
, in practice), which heavily disrupted the task of trying to keep the ratio (2) bounded. Usually the resolution to these sorts of discontinuities is to use some sort of random “average” of two or more deterministic constructions – for instance, by taking some random union of some numbers with
prime factors and some numbers with
prime factors – but the numerology turned out to be somewhat unfavorable, allowing for some improvement in the lower bounds over my previous construction, but not enough to close the gap entirely. It was only after substantial trial and error that I was able to find a working deterministic construction, where at a given scale one collected either numbers with at most
prime factors, or numbers with
prime factors but with the largest prime factor in a specific range, in which I could finally get the numerator and denominator in (2) to be in balance for every
. But once the construction was written down, the verification of the required properties ended up being quite routine.
Many modern mathematical proofs are a combination of conceptual arguments and technical calculations. There is something of a tradeoff between the two: one can add more conceptual arguments to try to reduce the technical computations, or vice versa. (Among other things, this leads to a Berkson paradox-like phenomenon in which a negative correlation can be observed between the two aspects of a proof; see this recent Mastodon post of mine for more discussion.)
In a recent article, Heather Macbeth argues that the preferred balance between conceptual and computational arguments is quite different for a computer-assisted proof than it is for a purely human-readable proof. In the latter, there is a strong incentive to minimize the amount of calculation to the point where it can be checked by hand, even if this requires a certain amount of ad hoc rearrangement of cases, unmotivated parameter selection, or otherwise non-conceptual additions to the arguments in order to reduce the calculation. But in the former, once one is willing to outsource any tedious verification or optimization task to a computer, the incentives are reversed: freed from the need to arrange the argument to reduce the amount of calculation, one can now describe an argument by listing the main ingredients and then letting the computer figure out a suitable way to combine them to give the stated result. The two approaches can thus be viewed as complementary ways to describe a result, with neither necessarily being superior to the other.
In this post, I would like to illustrate this computation-outsourced approach with the topic of zero-density theorems for the Riemann zeta function, in which all computer verifiable calculations (as well as other routine but tedious arguments) are performed “off-stage”, with the intent of focusing only on the conceptual inputs to these theorems.
Zero-density theorems concern upper bounds for the quantity for a given
and large
, which is defined as the number of zeroes of the Riemann zeta function in the rectangle
. (There is also an important generalization of this quantity to
-functions, but for simplicity we will focus on the classical zeta function case here). Such quantities are important in analytic number theory for many reasons, one of which is through explicit formulae such as the Riemann-von Mangoldt explicit formula
Clearly is non-increasing in
. The Riemann-von Mangoldt formula, together with the functional equation, gives us the asymptotic
Experience has shown that the most important quantity to control here is the exponent , defined as the least constant for which one has an asymptotic
In between the papers of Huxley and Guth-Maynard are dozens of additional improvements on , though it is only the Guth-Maynard paper that actually lowered the supremum norm
. A summary of most of the state of the art before Guth-Maynard may be found in Table 2 of this recent paper of Trudgian and Yang; it is complicated, but it is easy enough to get a computer to illustrate it with a plot:
(For an explanation of what is going on under the assumption of the Lindelöf hypothesis, see below the fold.) This plot represents the combined effort of nearly a dozen papers, each one of which claims one or more components of the depicted piecewise smooth curve, and is written in the “human-readable” style mentioned above, where the argument is arranged to reduce the amount of tedious computation to human-verifiable levels, even if this comes the cost of obscuring the conceptual ideas. (For an animation of how this bound improved over time, see here.) Below the fold, I will try to describe (in sketch form) some of the standard ingredients that go into these papers, in particular the routine reduction of deriving zero density estimates from large value theorems for Dirichlet series. We will not attempt to rewrite the entire literature of zero-density estimates in this fashion, but focus on some illustrative special cases.
The Salem prize was established in 1968 and named in honor of Raphaël Salem (1898-1963), a mathematician famous notably for his deep study of the links between Fourier series and number theory and for pioneering applications of probabilistic methods to these fields. It was not awarded from 2019-2022, due to both the COVID pandemic and the death of Jean Bourgain who had been almost single-handedly administering the prize, but is now active again, being administered by Akshay Ventakesh and the IAS. I chair the scientific committee for this prize, whose other members are Guy David and Mikhail Sodin. Last year, the prize was awarded to Sarah Peluse and Julian Sahasrabudhe.
Nominations for the 2024 Salem Prize are now open until September 1st. Nominations should include a CV of the nominee and a nomination letter explaining the significance of the nominee’s work. Supplementary documentation, such as supporting letters of recommendation or key publications, can additionally be provided, but are not required.
Nominees may be individuals from any country or institution. Preference will be given to nominees who have received their PhD in the last ten years, although this rule may be relaxed if there are mitigating personal circumstances, or if there have been few Salem prize winners in recent years. Self-nominations will not be considered, nor are past Prize winners or Scientific Committee members eligible.
The prize does not come with a direct monetary award, but winners will be invited to visit the IAS and to give a lecture associated with the award of the prize.
See also the previous year’s announcement of the Salem prize nomination period.
[This post is dedicated to Luca Trevisan, who recently passed away due to cancer. Though far from his most significant contribution to the field, I would like to mention that, as with most of my other blog posts on this site, this page was written with the assistance of Luca’s LaTeX to WordPress converter. Mathematically, his work and insight on pseudorandomness in particular have greatly informed how I myself think about the concept. – T.]
Recently, Timothy Gowers, Ben Green, Freddie Manners, and I were able to establish the following theorem:
Theorem 1 (Marton’s conjecture) Letbe non-empty with
. Then there exists a subgroup
of
with
such that
is covered by at most
translates of
, for some absolute constant
.
We established this result with , although it has since been improved to
by Jyun-Jie Liao.
Our proof was written in order to optimize the constant as much as possible; similarly for the more detailed blueprint of the proof that was prepared in order to formalize the result in Lean. I have been asked a few times whether it is possible to present a streamlined and more conceptual version of the proof in which one does not try to establish an explicit constant
, but just to show that the result holds for some constant
. This is what I will attempt to do in this post, though some of the more routine steps will be outsourced to the aforementioned blueprint.
The key concept here is that of the entropic Ruzsa distance between two random variables
taking values
, defined as
Theorem 2 (Entropic Marton’s conjecture) Letbe a
-valued random variable with
. Then there exists a uniform random variable
on a subgroup
of
such that
for some absolute constant
.
We were able to establish Theorem 2 with , which implies Theorem 1 with
by fairly standard additive combinatorics manipulations (such as the Ruzsa covering lemma); see the blueprint for details.
The key proposition needed to establish Theorem 2 is the following distance decrement property:
Proposition 3 (Distance decrement) Ifare
-valued random variables, then one can find
-valued random variables
such that
and
for some absolute constants
.
Indeed, suppose this proposition held. Starting with both equal to
and iterating, one can then find sequences of random variables
with
,
To prove Proposition 3, we can reformulate it as follows:
Proposition 4 (Lack of distance decrement implies vanishing) Ifare
-valued random variables, with the property that
for all
-valued random variables
and some sufficiently small absolute constant
, then one can derive a contradiction.
Indeed, we may assume from the above proposition that
The entire game is now to use Shannon entropy inequalities and “entropic Ruzsa calculus” to deduce a contradiction from (1) for small enough. This we will do below the fold, but before doing so, let us first make some adjustments to (1) that will make it more useful for our purposes. Firstly, because conditional entropic Ruzsa distance (see blueprint for definitions) is an average of unconditional entropic Ruzsa distance, we can automatically upgrade (1) to the conditional version
— 1. Main argument —
Now we derive more and more consequences of (2) – at some point crucially using the hypothesis that we are in characteristic two – before we reach a contradiction.
Right now, our hypothesis (2) only supplies lower bounds on entropic distances. The crucial ingredient that allows us to proceed is what we call the fibring identity, which lets us convert these lower bounds into useful upper bounds as well, which in fact match up very nicely when is small. Informally, the fibring identity captures the intuitive fact that the doubling constant of a set
should be at least as large as the doubling constant of the image
of that set under a homomorphism, times the doubling constant of a typical fiber
of that homomorphism; and furthermore, one should only be close to equality if the fibers “line up” in some sense.
Here is the fibring identity:
Proposition 5 (Fibring identity) Letbe a homomorphism. Then for any independent
-valued random variables
, one has
The proof is of course in the blueprint, but given that it is a central pillar of the argument, I reproduce it here.
Proof: Expanding out the definition of Ruzsa distance, and using the conditional entropy chain rule
We will only care about the characteristic setting here, so we will now assume that all groups involved are
-torsion, so that we can replace all subtractions with additions. If we specialize the fibring identity to the case where
,
,
is the addition map
, and
,
are pairs of independent random variables in
, we obtain the following corollary:
Corollary 6 Letbe independent
-valued random variables. Then we have the identity
This is a useful and flexible identity, especially when combined with (2). For instance, we can discard the conditional mutual information term as being non-negative, to obtain the inequality
Now we use the fibring identity again, relabeling as
and requiring
to be independent copies of
. We conclude that
Remark 7 A similar argument works in the-torsion case for general
. Instead of decrementing the entropic Ruzsa distance, one instead decrements a “multidistance”
for independent
. By an iterated version of the fibring identity, one can first reduce again to the symmetric case where the random variables are all copies of the same variable
. If one then takes
,
to be an array of
copies of
, one can get to the point where the row sums
and the column sums
have small conditional mutual information with respect to the double sum
. If we then set
and
, the data processing inequality again shows that
and
are nearly independent given
. The
-torsion now crucially intervenes as before to ensure that
has the same form as
or
, leading to a contradiction as before. See this previous blog post for more discussion.
I’ve just uploaded to the arXiv my paper “On product representations of squares“. This short paper answers (in the negative) a (somewhat obscure) question of Erdös. Namely, for any , let
be the size of the largest subset
of
with the property that no
distinct elements of
multiply to a square. In a paper by Erdös, Sárközy, and Sós, the following asymptotics were shown for fixed
:
-
.
-
.
-
.
-
for
.
-
for
.
-
for
.
In the end, the argument turned out to be relatively simple; no advanced results from additive combinatorics, graph theory, or analytic number theory were required. I found it convenient to proceed via the probabilistic method (although the more combinatorial technique of double counting would also suffice here). The main idea is to generate a tuple of distinct random natural numbers in
which multiply to a square, and which are reasonably uniformly distributed throughout
, in that each individual number
is attained by one of the random variables
with a probability of
. If one can find such a distribution, then if the density of
is sufficienly close to
, it will happen with positive probability that each of the
will lie in
, giving the claim.
When , this strategy cannot work, as it contradicts the arguments of Erdös, Särközy, and Sós. The reason can be explained as follows. The most natural way to generate a triple
of random natural numbers in
which multiply to a square is to set
However, the situation changes for larger . For instance, for
, we can try the same strategy with the ansatz
The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers -norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.
Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree filtration
In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter
).
Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.
A recent paper of Kra, Moreira, Richter, and Robertson established the following theorem, resolving a question of Erdös. Given a discrete amenable group , and a subset
of
, we define the Banach density of
to be the quantity
Theorem 1 Letbe a countably infinite abelian group with the index
finite. Let
be a positive Banach density subset of
. Then there exists an infinite set
and
such that
.
Strictly speaking, the main result of Kra et al. only claims this theorem for the case of the integers , but as noted in the recent preprint of Charamaras and Mountakis, the argument in fact applies for all countable abelian
in which the subgroup
has finite index. This condition is in fact necessary (as observed by forthcoming work of Ethan Acklesberg): if
has infinite index, then one can find a subgroup
of
of index
for any
that contains
(or equivalently,
is
-torsion). If one lets
be an enumeration of
, and one can then check that the set
Theorem 1 resembles other theorems in density Ramsey theory, such as Szemerédi’s theorem, but with the notable difference that the pattern located in the dense set is infinite rather than merely arbitrarily large but finite. As such, it does not seem that this theorem can be proven by purely finitary means. However, one can view this result as the conjunction of an infinite number of statements, each of which is a finitary density Ramsey theory statement. To see this, we need some more notation. Observe from Tychonoff’s theorem that the collection
is a compact topological space (with the topology of pointwise convergence) (it is also metrizable since
is countable). Subsets
of
can be thought of as properties of subsets of
; for instance, the property a subset
of
of being finite is of this form, as is the complementary property of being infinite. A property of subsets of
can then be said to be closed or open if it corresponds to a closed or open subset of
. Thus, a property is closed and only if if it is closed under pointwise limits, and a property is open if, whenever a set
has this property, then any other set
that shares a sufficiently large (but finite) initial segment with
will also have this property. Since
is compact and Hausdorff, a property is closed if and only if it is compact.
The properties of being finite or infinite are neither closed nor open. Define a smallness property to be a closed (or compact) property of subsets of that is only satisfied by finite sets; the complement to this is a largeness property, which is an open property of subsets of
that is satisfied by all infinite sets. (One could also choose to impose other axioms on these properties, for instance requiring a largeness property to be an upper set, but we will not do so here.) Examples of largeness properties for a subset
of
include:
-
has at least
elements.
-
is non-empty and has at least
elements, where
is the smallest element of
.
-
is non-empty and has at least
elements, where
is the
element of
.
-
halts when given
as input, where
is a given Turing machine that halts whenever given an infinite set as input. (Note that this encompasses the preceding three examples as special cases, by selecting
appropriately.)
Theorem 1 is then equivalent to the following “almost finitary” version (cf. this previous discussion of almost finitary versions of the infinite pigeonhole principle):
Theorem 2 (Almost finitary form of main theorem) Letbe a countably infinite abelian group with
finite. Let
be a Følner sequence in
, let
, and let
be a largeness property for each
. Then there exists
such that if
is such that
for all
, then there exists a shift
and
contains a
-large set
such that
.
Proof of Theorem 2 assuming Theorem 1. Let ,
,
be as in Theorem 2. Suppose for contradiction that Theorem 2 failed, then for each
we can find
with
for all
, such that there is no
and
-large
such that
. By compactness, a subsequence of the
converges pointwise to a set
, which then has Banach density at least
. By Theorem 1, there is an infinite set
and a
such that
. By openness, we conclude that there exists a finite
-large set
contained in
, thus
. This implies that
for infinitely many
, a contradiction.
Proof of Theorem 1 assuming Theorem 2. Let be as in Theorem 1. If the claim failed, then for each
, the property
of being a set
for which
would be a smallness property. By Theorem 2, we see that there is a
and a
obeying the complement of this property such that
, a contradiction.
Remark 3 Define a relationbetween
and
by declaring
if
and
. The key observation that makes the above equivalences work is that this relation is continuous in the sense that if
is an open subset of
, then the inverse image
is also open. Indeed, if
for some
, then
contains a finite set
such that
, and then any
that contains both
and
lies in
.
For each specific largeness property, such as the examples listed previously, Theorem 2 can be viewed as a finitary assertion (at least if the property is “computable” in some sense), but if one quantifies over all largeness properties, then the theorem becomes infinitary. In the spirit of the Paris-Harrington theorem, I would in fact expect some cases of Theorem 2 to undecidable statements of Peano arithmetic, although I do not have a rigorous proof of this assertion.
Despite the complicated finitary interpretation of this theorem, I was still interested in trying to write the proof of Theorem 1 in some sort of “pseudo-finitary” manner, in which one can see analogies with finitary arguments in additive combinatorics. The proof of Theorem 1 that I give below the fold is my attempt to achieve this, although to avoid a complete explosion of “epsilon management” I will still use at one juncture an ergodic theory reduction from the original paper of Kra et al. that relies on such infinitary tools as the ergodic decomposition, the ergodic theory, and the spectral theorem. Also some of the steps will be a little sketchy, and assume some familiarity with additive combinatorics tools (such as the arithmetic regularity lemma).
This post contains two unrelated announcements. Firstly, I would like to promote a useful list of resources for AI in Mathematics, that was initiated by Talia Ringer (with the crowdsourced assistance of many others) during the National Academies workshop on “AI in mathematical reasoning” last year. This list is now accepting new contributions, updates, or corrections; please feel free to submit them directly to the list (which I am helping Talia to edit). Incidentally, next week there will be a second followup webinar to the aforementioned workshop, building on the topics covered there. (The first webinar may be found here.)
Secondly, I would like to advertise the erdosproblems.com website, launched recently by Thomas Bloom. This is intended to be a living repository of the many mathematical problems proposed in various venues by Paul Erdős, who was particularly noted for his influential posing of such problems. For a tour of the site and an explanation of its purpose, I can recommend Thomas’s recent talk on this topic at a conference last week in honor of Timothy Gowers.
Thomas is currently issuing a call for help to develop the erdosproblems.com website in a number of ways (quoting directly from that page):
- You know Github and could set a suitable project up to allow people to contribute new problems (and corrections to old ones) to the database, and could help me maintain the Github project;
- You know things about web design and have suggestions for how this website could look or perform better;
- You know things about Python/Flask/HTML/SQL/whatever and want to help me code cool new features on the website;
- You know about accessibility and have an idea how I can make this website more accessible (to any group of people);
- You are a mathematician who has thought about some of the problems here and wants to write an expanded commentary for one of them, with lots of references, comparisons to other problems, and other miscellaneous insights (mathematician here is interpreted broadly, in that if you have thought about the problems on this site and are willing to write such a commentary you qualify);
- You knew Erdős and have any memories or personal correspondence concerning a particular problem;
- You have solved an Erdős problem and I’ll update the website accordingly (and apologies if you solved this problem some time ago);
- You have spotted a mistake, typo, or duplicate problem, or anything else that has confused you and I’ll correct things;
- You are a human being with an internet connection and want to volunteer a particular Erdős paper or problem list to go through and add new problems from (please let me know before you start, to avoid duplicate efforts);
- You have any other ideas or suggestions – there are probably lots of things I haven’t thought of, both in ways this site can be made better, and also what else could be done from this project. Please get in touch with any ideas!
I for instance contributed a problem to the site (#587) that Erdős himself gave to me personally (this was the topic of a somewhat well known photo of Paul and myself, and which he communicated again to be shortly afterwards on a postcard; links to both images can be found by following the above link). As it turns out, this particular problem was essentially solved in 2010 by Nguyen and Vu.
(Incidentally, I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my slides.)
[This post is dedicated to Luca Trevisan, who recently passed away due to cancer. Though far from his most significant contribution to the field, I would like to mention that, as with most of my other blog posts on this site, this page was written with the assistance of Luca’s LaTeX to WordPress converter. Mathematically, his work and insight on pseudorandomness in particular have greatly informed how I myself think about the concept. – T.]
Recently, Timothy Gowers, Ben Green, Freddie Manners, and I were able to establish the following theorem:
Theorem 1 (Marton’s conjecture) Let
be non-empty with
. Then there exists a subgroup
of
with
such that
is covered by at most
translates of
, for some absolute constant
.
We established this result with , although it has since been improved to
by Jyun-Jie Liao.
Our proof was written in order to optimize the constant as much as possible; similarly for the more detailed blueprint of the proof that was prepared in order to formalize the result in Lean. I have been asked a few times whether it is possible to present a streamlined and more conceptual version of the proof in which one does not try to establish an explicit constant
, but just to show that the result holds for some constant
. This is what I will attempt to do in this post, though some of the more routine steps will be outsourced to the aforementioned blueprint.
The key concept here is that of the entropic Ruzsa distance between two random variables
taking values
, defined as
where are independent copies of
, and
denotes the Shannon entropy of
. This distance is symmetric and non-negative, and obeys the triangle inequality
for any random variables ; see the blueprint for a proof. The above theorem then follows from an entropic analogue:
Theorem 2 (Entropic Marton’s conjecture) Let
be a
-valued random variable with
. Then there exists a uniform random variable
on a subgroup
of
such that
for some absolute constant
.
We were able to establish Theorem 2 with , which implies Theorem 1 with
by fairly standard additive combinatorics manipulations; see the blueprint for details.
The key proposition needed to establish Theorem 2 is the following distance decrement property:
Proposition 3 (Distance decrement) If
are
-valued random variables, then one can find
-valued random variables
such that
and
for some absolute constants
.
Indeed, suppose this proposition held. Starting with both equal to
and iterating, one can then find sequences of random variables
with
,
and
In particular, from the triangle inequality and geometric series
By weak compactness, some subsequence of the ,
converge to some limiting random variables
, and by some simple continuity properties of entropic Ruzsa distance, we conclude that
and
Theorem 2 then follows from the “100% inverse theorem” for entropic Ruzsa distance; see the blueprint for details.
To prove Proposition 3, we can reformulate it as follows:
Proposition 4 (Lack of distance decrement implies vanishing) If
are
-valued random variables, with the property that
for all
-valued random variables
and some sufficiently small absolute constant
, then one can derive a contradiction.
Indeed, we may assume from the above proposition that
for some , which will imply Proposition 3 with
.
The entire game is now to use Shannon entropy inequalities and “entropic Ruzsa calculus” to deduce a contradiction from (1) for small enough. This we will do below the fold, but before doing so, let us first make some adjustments to (1) that will make it more useful for our purposes. Firstly, because conditional entropic Ruzsa distance (see blueprint for definitions) is an average of unconditional entropic Ruzsa distance, we can automatically upgrade (1) to the conditional version
for any random variables that are possibly coupled with
respectively. In particular, if we define a “relevant” random variable
(conditioned with respect to some auxiliary data
) to be a random variable for which
or equivalently (by the triangle inequality)
then we have the useful lower bound
whenever and
are relevant conditioning on
respectively. This is quite a useful bound, since the laws of “entropic Ruzsa calculus” will tell us, roughly speaking, that virtually any random variable that we can create from taking various sums of copies of
and conditioning against other sums, will be relevant. (Informally: the space of relevant random variables is
-separated with respect to the entropic Ruzsa distance.)
— 1. Main argument —
Now we derive more and more consequences of (2) – at some point crucially using the hypothesis that we are in characteristic two – before we reach a contradiction.
Right now, our hypothesis (2) only supplies lower bounds on entropic distances. The crucial ingredient that allows us to proceed is what we call the fibring identity, which lets us convert these lower bounds into useful upper bounds as well, which in fact match up very nicely when is small. Informally, the fibring identity captures the intuitive fact that the doubling constant of a set
should be at least as large as the doubling constant of the image
of that set under a homomorphism, times the doubling constant of a typical fiber
of that homomorphism; and furthermore, one should only be close to equality if the fibers “line up” in some sense.
Here is the fibring identity:
Proposition 5 (Fibring identity) Let
be a homomorphism. Then for any independent
-valued random variables
, one has
The proof is of course in the blueprint, but given that it is a central pillar of the argumnt, I reproduce it here.
Proof: Expanding out the definition of Ruzsa distance, and using the conditional entropy chain rule
and
it suffices to establish the identity
But from the chain rule again we have
and from the definition of conditional mutual information (using the fact that is determined both by
and by
) one has
giving the claim.
We will only care about the characteristic setting here, so we will now assume that all groups involved are
-torsion, so that we can replace all subtractions with additions. If we specialize the fibring identity to the case where
,
,
is the addition map
, and
,
are pairs of independent random variables in
, we obtain the following corollary:
Corollary 6 Let
be independent
-valued random variables. Then we have the identity
This is a useful and flexible identity, especially when combined with (2). For instance, we can discard the conditional mutual information term as being non-negative, to obtain the inequality
If we let be independent copies of
respectively (note the swap in the last two variables!) we obtain
From entropic Ruzsa calculus, one can check that ,
, and
are all relevant random variables, so from (2) we now obtain both upper and lower bounds for
:
A pleasant upshot of this is that we now get to work in the symmetric case without loss of generality. Indeed, if we set
, we now have from (2) that
whenever are relevant, which by entropic Ruzsa calculus is equivalent to asking that
Now we use the fibring identity again, relabeling as
and requiring
to be independent copies of
. We conclude that
As before, the random variables ,
,
,
are all relevant, so from (3) we have
We could now also match these lower bounds with upper bounds, but the more important takeaway from this analysis is a really good bound on the conditional mutual information:
By the data processing inequality, we can discard some of the randomness here, and conclude
Let us introduce the random variables
then we have
Intuitively, this means that and
are very nearly independent given
. For sake of argument, let us assume that they are actually independent; one can achieve something resembling this by invoking the entropic Balog-Szemerédi-Gowers theorem, established in the blueprint, after conceding some losses of
in the entropy, but we skip over the details for this blog post. The key point now is that because we are in characteristic
,
has the same form as
or
:
In particular, by permutation symmetry, we have
and so by the definition of conditional Ruzsa distance we have a massive distance decrement
contradicting (1) as desired. (In reality, we end up decreasing the distance not all the way to zero, but instead to due to losses in the Balog-Szemerédi-Gowers theorem, but this is still enough to reach a contradiction.)
Remark 7 A similar argument works in the
-torsion case for general
. Instead of decrementing the entropic Ruzsa distance, one instead decrements a “multidistance”
for independent
. By an iterated version of the fibring identity, one can first reduce again to the symmetric case where the random variables are all copies of the same variable
. If one then takes
,
to be an array of
copies of
, one can get to the point where the row sums
and the column sums
have small conditional mutual information with respect to the double sum
. If we then set
and
, the data processing inequality again shows that
and
are nearly independent given
. The
-torsion now crucially intervenes as before to ensure that
has the same form as
or
, leading to a contradiction as before. See this previous blog post for more discussion.
The first progress prize competition for the AI Mathematical Olympiad has now launched. (Disclosure: I am on the advisory committee for the prize.) This is a competition in which contestants submit an AI model which, after the submissions deadline on June 27, will be tested (on a fixed computational resource, without internet access) on a set of 50 “private” test math problems, each of which has an answer as an integer between 0 and 999. Prior to the close of submission, the models can be tested on 50 “public” test math problems (where the results of the model are public, but not the problems themselves), as well as 10 training problems that are available to all contestants. As of this time of writing, the leaderboard shows that the best-performing model has solved 4 out of 50 of the questions (a standard benchmark, Gemma 7B, had previously solved 3 out of 50). A total of $ ($1.048 million) has been allocated for various prizes associated to this competition. More detailed rules can be found here.
Recent Comments