Many modern mathematical proofs are a combination of conceptual arguments and technical calculations. There is something of a tradeoff between the two: one can add more conceptual arguments to try to reduce the technical computations, or vice versa. (Among other things, this leads to a Berkson paradox-like phenomenon in which a negative correlation can be observed between the two aspects of a proof; see this recent Mastodon post of mine for more discussion.)

In a recent article, Heather Macbeth argues that the preferred balance between conceptual and computational arguments is quite different for a computer-assisted proof than it is for a purely human-readable proof. In the latter, there is a strong incentive to minimize the amount of calculation to the point where it can be checked by hand, even if this requires a certain amount of ad hoc rearrangement of cases, unmotivated parameter selection, or otherwise non-conceptual additions to the arguments in order to reduce the calculation. But in the former, once one is willing to outsource any tedious verification or optimization task to a computer, the incentives are reversed: freed from the need to arrange the argument to reduce the amount of calculation, one can now describe an argument by listing the main ingredients and then letting the computer figure out a suitable way to combine them to give the stated result. The two approaches can thus be viewed as complementary ways to describe a result, with neither necessarily being superior to the other.

In this post, I would like to illustrate this computation-outsourced approach with the topic of zero-density theorems for the Riemann zeta function, in which all computer verifiable calculations (as well as other routine but tedious arguments) are performed “off-stage”, with the intent of focusing only on the conceptual inputs to these theorems.

Zero-density theorems concern upper bounds for the quantity {N(\sigma,T)} for a given {1/2 \leq \sigma \leq 1} and large {T}, which is defined as the number of zeroes of the Riemann zeta function in the rectangle {\{ \beta+i\gamma: \sigma \leq \beta \leq 1; 0 \leq \gamma \leq T \}}. (There is also an important generalization of this quantity to {L}-functions, but for simplicity we will focus on the classical zeta function case here). Such quantities are important in analytic number theory for many reasons, one of which is through explicit formulae such as the Riemann-von Mangoldt explicit formula

\displaystyle  \sum_{n \leq x}^{\prime} \Lambda(n) = x - \sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2} \log(1-x^{-2}) \ \ \ \ \ (1)

relating the prime numbers to the zeroes of the zeta function (the “music of the primes”). The better bounds one has on {N(\sigma,T)}, the more control one has on the complicated term {\sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho}} on the right-hand side.

Clearly {N(\sigma,T)} is non-increasing in {\sigma}. The Riemann-von Mangoldt formula, together with the functional equation, gives us the asymptotic

\displaystyle  N(1/2,T) \asymp T \log T

in the {\sigma=1/2} case, while the prime number theorem tells us that

\displaystyle  N(1,T) = 0. \ \ \ \ \ (2)

The various zero free regions for the zeta function can be viewed as slight improvements to (2); for instance, the classical zero-free region is equivalent to the assertion that {N(\sigma,T)} vanishes if {\sigma > 1 - c/\log T} for some small absolute constant {c>0}, and the Riemann hypothesis is equivalent to the assertion that {N(\sigma,T)=0} for all {\sigma>1/2}.

Experience has shown that the most important quantity to control here is the exponent {A(\sigma)}, defined as the least constant for which one has an asymptotic

\displaystyle  N(\sigma,T) = T^{A(\sigma)(1-\sigma)+o(1)}

as {T \rightarrow \infty}. Thus, for instance,

\displaystyle  A(1/2) = 2, \ \ \ \ \ (3)

{A(1) = 0}, and {A(\sigma)(1-\sigma)} is a non-increasing function of {\sigma}, so we obtain the trivial “von Mangoldt” zero density theorem

\displaystyle  A(\sigma) \leq \frac{1}{1-\sigma}.

Of particular interest is the supremal value {\|A\|_\infty := \sup_{1/2 \leq \sigma \leq 1} A(\sigma)} of {A}, which has to be at least {2} thanks to (3). The density hypothesis asserts that the maximum is in fact exactly {2}, or equivalently that

\displaystyle  A(\sigma) \leq 2, \ \ \ \ \ (4)

for all {1/2 \leq \sigma \leq 1}. This is of course implied by the Riemann hypothesis (which clearly implies that {A(\sigma)=0} for all {\sigma>1/2}), but is a more tractable hypothesis to work with; for instance, the hypothesis is already known to hold for {\sigma \geq 25/32 = 0.78125} by the work of Bourgain (building upon many previous authors). The quantity {\|A\|_\infty} directly impacts our understanding of the prime number theorem in short intervals; indeed, it is not difficult using (1) (as well as the Vinogradov-Korobov zero-free region) to establish a short interval prime number theorem

\displaystyle  \sum_{x \leq n \leq x + x^\theta} \Lambda(n) = (1+o(1)) x^\theta

for all {x \rightarrow \infty} if {1 - \frac{1}{\|A\|_\infty} < \theta < 1} is a fixed exponent, or for almost all {x \rightarrow \infty} if {1 - \frac{2}{\|A\|_\infty} < \theta < 1} is a fixed exponent. Until recently, the best upper bound for {\|A\|_\infty} was {12/5 = 2.4}, thanks to a 1972 result of Huxley; but this was recently lowered to {30/13=2.307\ldots} in a breakthrough work of Guth and Maynard.

In between the papers of Huxley and Guth-Maynard are dozens of additional improvements on {A(\sigma)}, though it is only the Guth-Maynard paper that actually lowered the supremum norm {\|A\|_\infty}. A summary of most of the state of the art before Guth-Maynard may be found in Table 2 of this recent paper of Trudgian and Yang; it is complicated, but it is easy enough to get a computer to illustrate it with a plot:

(For an explanation of what is going on under the assumption of the Lindelöf hypothesis, see below the fold.) This plot represents the combined effort of nearly a dozen papers, each one of which claims one or more components of the depicted piecewise smooth curve, and is written in the “human-readable” style mentioned above, where the argument is arranged to reduce the amount of tedious computation to human-verifiable levels, even if this comes the cost of obscuring the conceptual ideas. (For an animation of how this bound improved over time, see here.) Below the fold, I will try to describe (in sketch form) some of the standard ingredients that go into these papers, in particular the routine reduction of deriving zero density estimates from large value theorems for Dirichlet series. We will not attempt to rewrite the entire literature of zero-density estimates in this fashion, but focus on some illustrative special cases.

— 1. Zero detecting polynomials —

As we are willing to lose powers of {T^{o(1)}} here, it is convenient to adopt the asymptotic notation {X \lessapprox Y} (or {Y \gtrapprox X}) for {X \leq T^{o(1)} Y}, and similarly {X \approx Y} for {X \lessapprox Y \gtrapprox X}.

The Riemann-von Mangoldt formula implies that any unit square in the critical strip only contains {\lessapprox 1} zeroes, so for the purposes of counting {N(\sigma,T)} up to {T^{o(1)}} errors, one can restrict attention to counting sets of zeroes {\beta+i\gamma} whose imaginary parts {\gamma} are {1}-separated, and we will do so henceforth. By dyadic decomposition, we can also restrict attention to zeroes with imaginary part {\gamma} comparable to {T} (rather than lying between {0} and {T}.)

The Riemann-Siegel formula, roughly speaking, tells us that for a zero {\beta+i\gamma} as above, we have

\displaystyle  \zeta(\beta + i \gamma) = \sum_{n \leq T^{1/2}} \frac{1}{n^{\beta+i\gamma}} + \dots \ \ \ \ \ (5)

plus terms which are of lower order when {\beta > 1/2}. One can decompose the sum here dyadically into {\approx 1} pieces that look like

\displaystyle  N^{-\beta} \sum_{n \sim N} \frac{1}{n^{i\gamma}}

for {1 \leq N \ll T^{1/2}}. The {N=1} component of this sum is basically {1}; so if there is to be a zero at {\beta+i\gamma}, we expect one of the other terms to balance it out, and so we should have

\displaystyle  |\sum_{n \sim N} \frac{1}{n^{i\gamma}}| \gtrapprox N^\beta \geq N^\sigma \ \ \ \ \ (6)

for at least one value of {1 < N \ll T^{1/2}}. In the notation of this subject, the expressions {\sum_{n \sim N} \frac{1}{n^{it}}} are known as zero detecting (Dirichlet) polynomials; the large values of such polynomials provide a set of candidates where zeroes can occur, and so upper bounding the large values of such polynomials will lead to zero density estimates.

Unfortunately, the particular choice of zero detecting polynomials described above, while simple, is not useful for applications, because the polynomials with very small values of {N}, say {N=2}, will basically obey the largeness condition (6) a positive fraction of the time, leading to no useful estimates. (Note that standard “square root ” heuristics suggest that the left-hand side of (6) should typically be of size about {N^{1/2}}.) However, this can be fixed by the standard device of introducing a “mollifier” to eliminate the role of small primes. There is some flexibility in what mollifier to introduce here, but a simple choice is to multiply (5) by {\sum_{n \leq T^\varepsilon} \frac{\mu(n)}{n^{\beta+i\gamma}}} for a small {\varepsilon}, which morally speaking has the effect of eliminating the contribution of those terms {n} with {1 < n \leq T^\varepsilon}, at the cost of extending the range of {N} slightly from {T^{1/2}} to {T^{1/2+\varepsilon}}, and also introducing some error terms at scales between {T^\varepsilon} and {T^{2\varepsilon}}. The upshot is that one then gets a slightly different set of zero-detecting polynomials: one family (often called “Type I”) is basically of the form

\displaystyle  \sum_{n \sim N} \frac{1}{n^{i\gamma}}

for {T^\varepsilon \ll N \ll T^{1/2+\varepsilon}}, and another family (“Type II”) is of the form

\displaystyle  \sum_{n \sim N} \frac{a_n}{n^{i\gamma}}

for {T^\varepsilon \ll N \ll T^{2\varepsilon}} and some coefficients {a_n} of size {\lessapprox 1}; see Section 10.2 of Iwaniec-Kowalski or these lecture notes of mine, or Appendix 3 of this recent paper of Maynard and Pratt for more details. It is also possible to reverse these implications and efficiently derive large values estimates from zero density theorems; see this recent paper of Matomäki and Teräväinen.

One can sometimes squeeze a small amount of mileage out of optimizing the {\varepsilon} parameter, but for the purpose of this blog post we shall just send {\varepsilon} to zero. One can then reformulate the above observations as follows. For given parameters {\sigma \geq 1/2} and {\alpha > 0}, let {C(\sigma,\alpha)} denote the best non-negative exponent for which the following large values estimate holds: given any sequence {a_n} of size {\lessapprox 1}, and any {1}-separated set of frequencies {t \sim T} for which

\displaystyle  |\sum_{n \sim N} \frac{a_n}{n^{it}}| \gtrapprox N^\sigma

for some {N \approx T^\alpha}, the number of such frquencies {t} does not exceed {T^{C(\sigma,\alpha)+o(1)}}. We define {C_1(\sigma,\alpha)} similarly, but where the coefficients {a_n} are also assumed to be identically {1}. Then clearly

\displaystyle  C_1(\sigma,\alpha) \leq C(\sigma,\alpha), \ \ \ \ \ (7)

and the above zero-detecting formalism is (morally, at least) asserting an inequality of the form

\displaystyle  A(\sigma)(1-\sigma) \leq \max( \sup_{0 < \alpha \leq 1/2} C_1(\sigma,\alpha), \limsup_{\alpha \rightarrow 0} C(\sigma,\alpha) ). \ \ \ \ \ (8)

The converse results of Matomäki and Teräväinen morally assert that this inequality is essentially an equality (there are some asterisks to this assertion which I will gloss over here). Thus, for instance, verifying the density hypothesis (4) for a given {\sigma} is now basically reduced to establishing the “Type I” bound

\displaystyle  C_1(\sigma,\alpha) \leq 2 (1-\sigma) \ \ \ \ \ (9)

for all {0 < \alpha \leq 1/2}, as well as the “Type II” variant

\displaystyle  C(\sigma,\alpha) \leq 2 (1-\sigma) + o(1) \ \ \ \ \ (10)

as {\alpha \rightarrow 0^+}.

As we shall see, the Type II task of controlling {C(\sigma,\alpha)} for small {\alpha} is relatively well understood (in particular, (10) is already known to hold for all {1/2 \leq \sigma \leq 1}, so in some sense the “Type II” half of the density hypothesis is already established); the main difficulty is with the Type I task, with the main difficulty being that the parameter {\alpha} (representing the length of the Dirichlet series) is often in an unfavorable location.

Remark 1 The approximate functional equation for the Riemann zeta function morally tells us that {C_1(\sigma,\alpha) = C_1(1/2 + \frac{\alpha}{1-\alpha}(\sigma-1/2),1-\alpha)}, but we will not have much use for this symmetry since we have in some sense already incorporated it (via the Riemann-Siegel formula) into the condition {\alpha \leq 1/2}.

The standard {L^2} mean value theorem for Dirichlet series tells us that a Dirichlet polynomial {\sum_{n \sim N} \frac{a_n}{n^{it}}} with {a_n \lessapprox 1} has an {L^2} mean value of {\lessapprox N^{1/2}} on any interval of length {N}, and similarly if we discretize {t} to a {1}-separated subset of that interval; this is easily established by using the approximate orthogonality properties of the function {t \mapsto \frac{1}{n^{it}}} on such an interval. Since an interval of length {T} can be subdivided into {O( (N+T)/N )} intervals of length {N}, we see from the Chebyshev inequality that such a polynomial can only exceed {\gtrapprox N^\sigma} on a {1}-separated subset of a length {T} interval of size {\lessapprox (N+T)/N \times N \times N^{1-2\sigma}}, which we can formalize in terms of the {C(\sigma,\alpha)} notation as

\displaystyle  C(\sigma,\alpha) \leq \min((2-2\sigma)\alpha, 1 + (1-2\sigma)\alpha). \ \ \ \ \ (11)

For instance, this (and (7)) already give the density hypothesis-strength bound (9) – but only at {\alpha = 1}. This initially looks useless, since we are restricting {\alpha} to the range {0 \leq \alpha \leq 1}; but there is a simple trick that allows one to greatly amplify this bound (as well as many other large values bounds). Namely, if one raises a Dirichlet series {\sum_{n \sim N} \frac{a_n}{n^{it}}} with {a_n \lessapprox 1} to some natural number power {k}, then one obtains another Dirichlet series {\sum_{n \sim N^k} \frac{b_n}{n^{it}}} with {b_n \lessapprox 1}, but now at length {N^k} instead of {N}. This can be encoded in terms of the {C(\sigma,\alpha)} notation as the inequality

\displaystyle  C(\sigma,\alpha) \leq C(\sigma,k\alpha) \ \ \ \ \ (12)

for any natural number {k \geq 1}. It would be very convenient if we could remove the restriction that {k} be a natural number here; there is a conjecture of Montgomery in this regard, but it is out of reach of current methods (it was observed by in this paper of Bourgain that it would imply the Kakeya conjecture!). Nevertheless, the relation (12) is already quite useful. Firstly, it can easily be used to imply the Type II case (10) of the density hypothesis, and also implies the Type I case (9) as long as {\alpha} is of the special form {\alpha = 1/k} for some natural number {k}. Rather than give a human-readable proof of this routine implication, let me illustrate it instead with a graph of what the best bound one can obtain for {C(\sigma,\alpha)} becomes for {\sigma=3/4}, just using (11) and (12):

Here we see that the bound for {C(\sigma,\alpha)} oscillates between the density hypothesis prediction of {2(1-\sigma)=1/2} (which is attained when {\alpha=1/k}), and a weaker upper bound of {\frac{12}{5}(1-\sigma) = 0.6}, which thanks to (7), (8) gives the upper bound {A(3/4) \leq \frac{12}{5}} that was first established in 1937 by Ingham (in the style of a human-readable proof without computer assistance, of course). The same argument applies for all {1/2 \leq \sigma \leq 1}, and gives rise to the bound {A(\sigma) \leq \frac{3}{2-\sigma}} in this interval, beating the trivial von Mangoldt bound of {A(\sigma) \leq \frac{1}{1-\sigma}}:

The method is flexible, and one can insert further bounds or hypotheses to improve the situation. For instance, the Lindelöf hypothesis asserts that {\zeta(1/2+it) \lessapprox 1} for all {0 \leq t \leq T}, which on dyadic decomposition can be shown to give the bound

\displaystyle  \sum_{n \sim N} \frac{1}{n^{it}} \lessapprox N^{1/2} \ \ \ \ \ (13)

for all {N \approx T^\alpha} and all fixed {0 < \alpha < 1} (in fact this hypothesis is basically equivalent to this estimate). In particular, one has

\displaystyle  C_1(\sigma,\alpha)=0 \ \ \ \ \ (14)

for any {\sigma > 1/2} and {\alpha > 0}. In particular, the Type I estimate (9) now holds for all {\sigma>1/2}, and so the Lindeöf hypothesis implies the Density hypothesis.

In fact, as observed by Hálasz and Turán in 1969, the Lindelöf hypothesis also gives good Type II control in the regime {\sigma > 3/4}. The key point here is that the bound (13) basically asserts that the functions {n \mapsto \frac{1}{n^{it}}} behave like orthogonal functions on the range {n \sim N}, and this together with a standard duality argument (related to the Bessel inequality, the large sieve, or the {TT^*} method in harmonic analysis) lets one control the large values of Dirichlet series, with the upshot here being that

\displaystyle  C(\sigma,\alpha)=0

for all {\sigma > 3/4} and {\alpha > 0}. This lets one go beyond the density hypothesis for {\sigma>3/4} and in fact obtain {A(\sigma)=0} in this case.

While we are not close to proving the full strength of (13), the theory of exponential sums gives us some relatively good control on the left-hand side in some cases. For instance, by using van der Corput estimates on (13), Montgomery in 1969 was able to obtain an unconditional estimate which in our notation would be

\displaystyle  C(\sigma,\alpha) \leq (2 - 2 \sigma) \alpha \ \ \ \ \ (15)

whenever {\sigma > \frac{1}{2} + \frac{1}{4\alpha}}. This is already enough to give some improvements to Ingham’s bound for very large {\sigma}. But one can do better by a simple subdivision observation of Huxley (which was already implicitly used to prove (11)): a large values estimate on an interval of size {T} automatically implies a large values estimate on a longer interval of size {T'}, simply by covering the latter interval by {O(T'/T)} intervals. This observation can be formalized as a general inequality

\displaystyle  C(\sigma,\alpha') \leq 1 - \frac{\alpha'}{\alpha} + \frac{\alpha'}{\alpha} C(\sigma,\alpha) \ \ \ \ \ (16)

whenever {1/2 \leq \sigma \leq 1} and {0 < \alpha' \leq \alpha \leq 1}; that is to say, the quantity {(1-C(\sigma\alpha))/\alpha} is non-decreasing in {\alpha}. This leads to the Huxley large values inequality, which in our notation asserts that

\displaystyle  C(\sigma,\alpha) \leq \min((2-2\sigma)\alpha, 1 + (4-6\sigma)\alpha) \ \ \ \ \ (17)

for all {1/2 \leq \sigma \leq 1} and {\alpha>0}, which is superior to (11) when {\sigma > 3/4}. If one simply adds either Montgomery’s inequality (15), or Huxley’s extension inequality (17), into the previous pool and asks the computer to optimize the bounds on {A(\sigma)} as a consequence, one obtains the following graph:

In particular, the density hypothesis is now established for all {\sigma > 5/6 = 0.833\dots}. But one can do better. Consider for instance the case of {\sigma=0.9}. Let us inspect the current best bounds on {C_1(\sigma,\alpha)} from the current tools:

Here we immediately see that it is only the {\alpha=0.5} case that is preventing us from improving the bound on {A(0.9)} to below the density hypothesis prediction of {2(1-\sigma) = 0.2}. However, it is possible to exclude this case through exponential sum estimates. In particular, the van der Corput inequality can be used to establish the bound {\zeta(1/2+it) \lessapprox T^{1/6}} for {t \lessapprox T}, or equivalently that

\displaystyle  \sum_{n \sim N} \frac{1}{n^{it}} \lessapprox N^{1/2} T^{1/6}

for {N \lessapprox T}; this already shows that {C_1(\sigma,\alpha)} vanishes unless

\displaystyle  \alpha \leq \frac{1}{6(\sigma-1/2)}, \ \ \ \ \ (18)

which improves upon the existing restriction {\alpha \leq 1/2} when {\sigma > 5/6}. If one inserts this new constraint into the pool, we recover the full strength of the Huxley bound

\displaystyle  A(\sigma) \leq \frac{3}{3\sigma-1}, \ \ \ \ \ (19)

valid for all {1/2 \leq \sigma \leq 1}, and which improves upon the Ingham bound for {3/4 \leq \sigma \leq 1}:

One can continue importing in additional large values estimates into this framework to obtain new zero density theorems. For instance, one could insert the twelfth moment estimate of Heath-Brown, which in our language asserts that {C_1(\sigma,\alpha) \leq 2 + (6-12\sigma) \alpha}; one could also insert variants of the van der Corput estimate, such as bounds coming from other exponent pairs, the Vinogradov mean value theorem or (more recently) the resolution of the Vinogradov main conjecture by Bourgain-Demeter-Guth using decoupling methods, or by Wooley using efficient congruencing methods. We close with an example from the Guth-Maynard paper. Their main technical estimate is to establish a new large values theorem (Proposition 3.1 from their paper), which in our notation asserts that

\displaystyle  C(\sigma,\alpha) \leq 1 + (\frac{12}{5}-4\sigma)\alpha \ \ \ \ \ (20)

whenever {0.7 \leq \sigma \leq 0.8} and {\alpha = \frac{5}{6}}. By subdivision (16), one also automatically obtains the same bound for {0 < \alpha \leq \frac{5}{6}} as well. If one drops this estimate into the mix, one obtains the Guth-Maynard addition

\displaystyle  A(\sigma) \leq \frac{15}{3+5\sigma} \ \ \ \ \ (21)

to the Ingham and Huxley bounds (which are in fact valid for all {1/2 \leq \sigma \leq 3/4}, but only novel in the interval {0.7 \leq \sigma \leq 0.8}):

This is not the most difficult (or novel) part of the Guth-Maynard paper – the proof of (20) occupies about 34 of the 48 pages of the paper – but it hopefully illustrates how some of the more routine portions of this type of work can be outsourced to a computer, at least if one is willing to be convinced purely by numerically produced graphs. Also, it is possible to transfer even more of the Guth-Maynard paper to this format, if one introduces an additional quantity {C^*(\sigma,\alpha)} that tracks not the number of large values of a Dirichlet series, but rather its energy, and interpreting several of the key sub-propositions of that paper as providing inequalities relating {C(\sigma,\alpha)} and {C^*(\sigma,\alpha)} (this builds upon an earlier paper of Heath-Brown that was the first to introduce non-trivial inequalities of this type).

The above graphs were produced by myself using some quite crude Python code (with a small amount of AI assistance, for instance via Github Copilot); the code does not actually “prove” estimates such as (19) or (21) to infinite accuracy, but rather to any specified finite accuracy, although one can at least make the bounds completely rigorous by discretizing using a mesh of rational numbers (which can be manipulated to infinite precision) and using the monotonicity properties of the various functions involved to control errors. In principle, it should be possible to create software that would work “symbolically” rather than “numerically”, and output (human-readable) proof certificates of bounds such as (21) from prior estimates such as (20) to infinite accuracy, in some formal proof verification language (e.g., Lean). Such a tool could potentially shorten the primary component of papers of this type, which would then focus on the main inputs to a standard inequality-chasing framework, rather than the routine execution of that framework which could then be deferred to an appendix or some computer-produced file. It seems that such a tool is now feasible (particularly with the potential of deploying AI tools to locate proof certificates in some tricky cases), and would be useful for many other analysis arguments involving explicit exponents than the zero-density example presented here (e.g., a version of this could have been useful to optimize constants in the recent resolution of the PFR conjecture), though perhaps the more practical workflow case for now is to use the finite-precision numerics approach to locate the correct conclusions and intermediate inequalities, and then prove those claims rigorously by hand.