You are currently browsing the category archive for the ‘math.NT’ category.

I’ve just uploaded to the arXiv my paper “Dense sets of natural numbers with unusually large least common multiples“. This short paper answers (in the negative) a somewhat obscure question of Erdös and Graham:

Problem 1 Is it true that if {A} is a set of natural numbers for which

\displaystyle  \frac{1}{\log\log x} \sum_{n \in A: n \leq x} \frac{1}{n} \ \ \ \ \ (1)

goes to infinity as {x \rightarrow \infty}, then the quantity

\displaystyle  \frac{1}{(\sum_{n \in A: n \leq x} \frac{1}{n})^2} \sum_{n,m \in A: n < m \leq x} \frac{1}{\mathrm{lcm}(n,m)} \ \ \ \ \ (2)

also goes to infinity as {x \rightarrow \infty}?

At first glance, this problem may seem rather arbitrary, but it can be motivated as follows. The hypothesis that (1) goes to infinity is a largeness condition on {A}; in view of Mertens’ theorem, it can be viewed as an assertion that {A} is denser than the set of primes. On the other hand, the conclusion that (2) grows is an assertion that {\frac{1}{\mathrm{lcm}(n,m)}} becomes significantly larger than {\frac{1}{nm}} on the average for large {n,m \in A}; that is to say, that many pairs of numbers in {A} share a common factor. Intuitively, the problem is then asking whether sets that are significantly denser than the primes must start having lots of common factors on average.

For sake of comparison, it is easy to see that if (1) goes to infinity, then at least one pair {(n,m)} of distinct elements in {A} must have a non-trivial common factor. For if this were not the case, then the elements of {A} are pairwise coprime, so each prime {p} has at most one multiple in {A}, and so can contribute at most {1/p} to the sum in (1), and hence by Mertens’ theorem, and the fact that every natural number greater than one is divisible by at least one prime {p}, the quantity (1) stays bounded, a contradiction.

It turns out, though, that the answer to the above problem is negative; one can find sets {A} that are denser than the primes, but for which (2) stays bounded, so that the least common multiples in the set are unusually large. It was a bit surprising to me that this question had not been resolved long ago (in fact, I was not able to find any prior literature on the problem beyond the original reference of Erdös and Graham); in contrast, another problem of Erdös and Graham concerning sets with unusually small least common multiples was extensively studied (and essentially solved) about twenty years ago, while the study of sets with unusually large greatest common divisor for many pairs in the set has recently become somewhat popular, due to their role in the proof of the Duffin-Schaeffer conjecture by Koukoulopoulos and Maynard.

To search for counterexamples, it is natural to look for numbers with relatively few prime factors, in order to reduce their common factors and increase their least common multiple. A particularly simple example, whose verification is on the level of an exercise in a graduate analytic number theory course, is the set of semiprimes (products of two primes), for which one can readily verify that (1) grows like {\log\log x} but (2) stays bounded. With a bit more effort, I was able to optimize the construction and uncover the true threshold for boundedness of (2), which was a little unexpected:

Theorem 2
  • (i) For any {C>0}, there exists a set of natural numbers {A} with

    \displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} = \exp( (C+o(1)) (\log\log x)^{1/2} \log\log\log x )

    for all large {x}, for which (2) stays bounded.
  • (ii) Conversely, if (2) stays bounded, then

    \displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} \ll \exp( O( (\log\log x)^{1/2} \log\log\log x ) )

    for all large {x}.

The proofs are not particularly long or deep, but I thought I would record here some of the process towards finding them. My first step was to try to simplify the condition that (2) stays bounded. In order to use probabilistic intuition, I first expressed this condition in probabilistic terms as

\displaystyle  \mathbb{E} \frac{\mathbf{n} \mathbf{m}}{\mathrm{lcm}(\mathbf{n}, \mathbf{m})} \ll 1

for large {x}, where {\mathbf{n}, \mathbf{m}} are independent random variables drawn from {\{ n \in A: n \leq x \}} with probability density function

\displaystyle  \mathbb{P} (\mathbf{n} = n) = \frac{1}{\sum_{m \in A: m \leq x} \frac{1}{m}} \frac{1}{n}.

The presence of the least common multiple in the denominator is annoying, but one can easily flip the expression to the greatest common divisor:

\displaystyle  \mathbb{E} \mathrm{gcd}(\mathbf{n}, \mathbf{m}) \ll 1.

If the expression {\mathrm{gcd}(\mathbf{n}, \mathbf{m})} was a product of a function of {\mathbf{n}} and a function of {\mathbf{m}}, then by independence this expectation would decouple into simpler averages involving just one random variable instead of two. Of course, the greatest common divisor is not of this form, but there is a standard trick in analytic number theory to decouple the greatest common divisor, namely to use the classic Gauss identity {n = \sum_{d|n} \varphi(d)}, with {\varphi} the Euler totient function, to write

\displaystyle  \mathrm{gcd}(\mathbf{n}, \mathbf{m}) = \sum_{d | \mathbf{n}, \mathbf{m}} \varphi(d).

Inserting this formula and interchanging the sum and expectation, we can now express the condition as bounding a sum of squares:

\displaystyle  \sum_d \varphi(d) \mathbb{P}(d|\mathbf{n})^2 \ll 1.

Thus, the condition (2) is really an assertion to the effect that typical elements of {A} do not have many divisors. From experience in sieve theory, the probabilities {\mathbb{P}(d|\mathbf{n})} tend to behave multiplicatively in {d}, so the expression here heuristically behaves like an Euler product that looks something like

\displaystyle  \prod_p (1 + \varphi(p) \mathbb{P}(p|\mathbf{n})^2)

and so the condition (2) is morally something like

\displaystyle  \sum_p p \mathbb{P}(p|\mathbf{n})^2 \ll 1. \ \ \ \ \ (3)

Comparing this with the Mertens’ theorems, this leads to the heuristic prediction that {\mathbb{P}(p|\mathbf{n})} (for a typical prie {p} much smaller than {x}) should decay somewhat like {\frac{1}{p (\log\log p)^{1/2}}} (ignoring for now factors of {\log\log\log p}). This can be compared to the example of the set of primes or semiprimes on one hand, where the probability is like {\frac{1}{p \log\log p}}, and the set of all natural numbers on the other hand, where the probability is like {\frac{1}{p}}. So the critical behavior should come from sets that are in some sense “halfway” between the primes and the natural numbers.

It is then natural to try a random construction, in which one sieves out the natural numbers by permitting each natural number {n} to survive with a probability resembling {\prod_{p|n} \frac{1}{(\log\log p)^{1/2}}}, in order to get the predicted behavior for {\mathbb{P}(p|\mathbf{n})}. Performing some standard calculations, this construction could ensure (2) bounded with a density a little bit less than the one stated in the main theorem; after optimizing the parameters, I could only get something like

\displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} = \exp( (\log\log x)^{1/2} (\log\log\log x)^{-1/2-o(1)} ).

I was stuck on optimising the construction further, so I turned my attention to a positive result in the spirit of (ii) of the main theorem. On playing around with (3), I observed that one could use Cauchy-Schwarz and Mertens’ theorem to obtain the bound

\displaystyle  \sum_{p \leq x} \mathbb{P}(p|\mathbf{n}) \ll (\log\log x)^{1/2}

which was in line with the previous heuristic that {\mathbb{P}(p|\mathbf{n})} should behave like {\frac{1}{p (\log\log p)^{1/2}}}. The left-hand side had a simple interpretation: by linearity of expectation, it was the expected number {\mathbb{E} \omega(\mathbf{n})} of prime factors of {\mathbf{n}}. So the boundedness of (2) implied that a typical element of {A} only had about {(\log\log x)^{1/2}} prime factors, in contrast to the {\log\log x} predicted by the Hardy-Ramanujan law. Standard methods from the anatomy of integers can then be used to see how dense a set with that many prime factors could be, and this soon led to a short proof of part (ii) of the main theorem (I eventually found for instance that Jensen’s inequality could be used to create a particularly slick argument).

It then remained to improve the lower bound construction to eliminate the {\log\log\log x} losses in the exponents. By deconstructing the proof of the upper bound, it became natural to consider something like the set of natural numbers {n} that had at most {(\log\log n)^{1/2}} prime factors. This construction actually worked for some scales {x} – namely those {x} for which {(\log\log x)^{1/2}} was a natural number – but there was some strange “discontinuities” in the analysis that prevented me from establishing the boundedness of (2) for arbitrary scales {x}. The basic problem was that increasing the number of permitted prime factors from one natural number threshold {k} to another {k+1} ended up increasing the density of the set by an unbounded factor (of the order of {k}, in practice), which heavily disrupted the task of trying to keep the ratio (2) bounded. Usually the resolution to these sorts of discontinuities is to use some sort of random “average” of two or more deterministic constructions – for instance, by taking some random union of some numbers with {k} prime factors and some numbers with {k+1} prime factors – but the numerology turned out to be somewhat unfavorable, allowing for some improvement in the lower bounds over my previous construction, but not enough to close the gap entirely. It was only after substantial trial and error that I was able to find a working deterministic construction, where at a given scale one collected either numbers with at most {k} prime factors, or numbers with {k+1} prime factors but with the largest prime factor in a specific range, in which I could finally get the numerator and denominator in (2) to be in balance for every {x}. But once the construction was written down, the verification of the required properties ended up being quite routine.

Many modern mathematical proofs are a combination of conceptual arguments and technical calculations. There is something of a tradeoff between the two: one can add more conceptual arguments to try to reduce the technical computations, or vice versa. (Among other things, this leads to a Berkson paradox-like phenomenon in which a negative correlation can be observed between the two aspects of a proof; see this recent Mastodon post of mine for more discussion.)

In a recent article, Heather Macbeth argues that the preferred balance between conceptual and computational arguments is quite different for a computer-assisted proof than it is for a purely human-readable proof. In the latter, there is a strong incentive to minimize the amount of calculation to the point where it can be checked by hand, even if this requires a certain amount of ad hoc rearrangement of cases, unmotivated parameter selection, or otherwise non-conceptual additions to the arguments in order to reduce the calculation. But in the former, once one is willing to outsource any tedious verification or optimization task to a computer, the incentives are reversed: freed from the need to arrange the argument to reduce the amount of calculation, one can now describe an argument by listing the main ingredients and then letting the computer figure out a suitable way to combine them to give the stated result. The two approaches can thus be viewed as complementary ways to describe a result, with neither necessarily being superior to the other.

In this post, I would like to illustrate this computation-outsourced approach with the topic of zero-density theorems for the Riemann zeta function, in which all computer verifiable calculations (as well as other routine but tedious arguments) are performed “off-stage”, with the intent of focusing only on the conceptual inputs to these theorems.

Zero-density theorems concern upper bounds for the quantity {N(\sigma,T)} for a given {1/2 \leq \sigma \leq 1} and large {T}, which is defined as the number of zeroes of the Riemann zeta function in the rectangle {\{ \beta+i\gamma: \sigma \leq \beta \leq 1; 0 \leq \gamma \leq T \}}. (There is also an important generalization of this quantity to {L}-functions, but for simplicity we will focus on the classical zeta function case here). Such quantities are important in analytic number theory for many reasons, one of which is through explicit formulae such as the Riemann-von Mangoldt explicit formula

\displaystyle  \sum_{n \leq x}^{\prime} \Lambda(n) = x - \sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2} \log(1-x^{-2}) \ \ \ \ \ (1)

relating the prime numbers to the zeroes of the zeta function (the “music of the primes”). The better bounds one has on {N(\sigma,T)}, the more control one has on the complicated term {\sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho}} on the right-hand side.

Clearly {N(\sigma,T)} is non-increasing in {\sigma}. The Riemann-von Mangoldt formula, together with the functional equation, gives us the asymptotic

\displaystyle  N(1/2,T) \asymp T \log T

in the {\sigma=1/2} case, while the prime number theorem tells us that

\displaystyle  N(1,T) = 0. \ \ \ \ \ (2)

The various zero free regions for the zeta function can be viewed as slight improvements to (2); for instance, the classical zero-free region is equivalent to the assertion that {N(\sigma,T)} vanishes if {\sigma > 1 - c/\log T} for some small absolute constant {c>0}, and the Riemann hypothesis is equivalent to the assertion that {N(\sigma,T)=0} for all {\sigma>1/2}.

Experience has shown that the most important quantity to control here is the exponent {A(\sigma)}, defined as the least constant for which one has an asymptotic

\displaystyle  N(\sigma,T) = T^{A(\sigma)(1-\sigma)+o(1)}

as {T \rightarrow \infty}. Thus, for instance,

\displaystyle  A(1/2) = 2, \ \ \ \ \ (3)

{A(1) = 0}, and {A(\sigma)(1-\sigma)} is a non-increasing function of {\sigma}, so we obtain the trivial “von Mangoldt” zero density theorem

\displaystyle  A(\sigma) \leq \frac{1}{1-\sigma}.

Of particular interest is the supremal value {\|A\|_\infty := \sup_{1/2 \leq \sigma \leq 1} A(\sigma)} of {A}, which has to be at least {2} thanks to (3). The density hypothesis asserts that the maximum is in fact exactly {2}, or equivalently that

\displaystyle  A(\sigma) \leq 2, \ \ \ \ \ (4)

for all {1/2 \leq \sigma \leq 1}. This is of course implied by the Riemann hypothesis (which clearly implies that {A(\sigma)=0} for all {\sigma>1/2}), but is a more tractable hypothesis to work with; for instance, the hypothesis is already known to hold for {\sigma \geq 25/32 = 0.78125} by the work of Bourgain (building upon many previous authors). The quantity {\|A\|_\infty} directly impacts our understanding of the prime number theorem in short intervals; indeed, it is not difficult using (1) (as well as the Vinogradov-Korobov zero-free region) to establish a short interval prime number theorem

\displaystyle  \sum_{x \leq n \leq x + x^\theta} \Lambda(n) = (1+o(1)) x^\theta

for all {x \rightarrow \infty} if {1 - \frac{1}{\|A\|_\infty} < \theta < 1} is a fixed exponent, or for almost all {x \rightarrow \infty} if {1 - \frac{2}{\|A\|_\infty} < \theta < 1} is a fixed exponent. Until recently, the best upper bound for {\|A\|_\infty} was {12/5 = 2.4}, thanks to a 1972 result of Huxley; but this was recently lowered to {30/13=2.307\ldots} in a breakthrough work of Guth and Maynard.

In between the papers of Huxley and Guth-Maynard are dozens of additional improvements on {A(\sigma)}, though it is only the Guth-Maynard paper that actually lowered the supremum norm {\|A\|_\infty}. A summary of most of the state of the art before Guth-Maynard may be found in Table 2 of this recent paper of Trudgian and Yang; it is complicated, but it is easy enough to get a computer to illustrate it with a plot:

(For an explanation of what is going on under the assumption of the Lindelöf hypothesis, see below the fold.) This plot represents the combined effort of nearly a dozen papers, each one of which claims one or more components of the depicted piecewise smooth curve, and is written in the “human-readable” style mentioned above, where the argument is arranged to reduce the amount of tedious computation to human-verifiable levels, even if this comes the cost of obscuring the conceptual ideas. (For an animation of how this bound improved over time, see here.) Below the fold, I will try to describe (in sketch form) some of the standard ingredients that go into these papers, in particular the routine reduction of deriving zero density estimates from large value theorems for Dirichlet series. We will not attempt to rewrite the entire literature of zero-density estimates in this fashion, but focus on some illustrative special cases.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “On product representations of squares“. This short paper answers (in the negative) a (somewhat obscure) question of Erdös. Namely, for any {k \geq 1}, let {F_k(N)} be the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no {k} distinct elements of {A} multiply to a square. In a paper by Erdös, Sárközy, and Sós, the following asymptotics were shown for fixed {k}:

  • {F_1(N) = (1+o(1)) N}.
  • {F_2(N) = (\frac{6}{\pi^2} + o(1)) N}.
  • {F_3(N) = (1+o(1)) N}.
  • {F_{4k}(N) = (1+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {F_{4k+2}(N) = (\frac{3}{2}+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {(\log 2 + o(1)) N \leq F_{2k+1}(N) \leq N} for {k \geq 2}.
Thus the asymptotics for {F_k(N)} for odd {k \geq 5} were not completely settled. Erdös asked if one had {F_k(N) = (1-o(1)) N} for odd {k \geq 5}. The main result of this paper is that this is not the case; that is to say, there exists {c_k>0} such that any subset {A} of {\{1,\dots,N\}} of cardinality at least {(1-c_k) N} will contain {k} distinct elements that multiply to a square, if {N} is large enough. In fact, the argument works for all {k \geq 4}, although it is not new in the even case. I will also note that there are now quite sharp upper and lower bounds on {F_k} for even {k \geq 4}, using methods from graph theory: see this recent paper of Pach and Vizer for the latest results in this direction. Thanks to the results of Granville and Soundararajan, we know that the constant {c_k} cannot exceed the Hall-Montgomery constant

\displaystyle  1 - \log(1+\sqrt{e}) + 2 \int_1^{\sqrt{e}} \frac{\log t}{t+1}\ dt = 0.171500\dots

and I (very tentatively) conjecture that this is in fact the optimal value for this constant. This looks somewhat difficult, but a more feasible conjecture would be that the {c_k} asymptotically approach the Hall-Montgomery constant as {k \rightarrow \infty}, since the aforementioned result of Granville and Soundararajan morally corresponds to the {k=\infty} case.

In the end, the argument turned out to be relatively simple; no advanced results from additive combinatorics, graph theory, or analytic number theory were required. I found it convenient to proceed via the probabilistic method (although the more combinatorial technique of double counting would also suffice here). The main idea is to generate a tuple {(\mathbf{n}_1,\dots,\mathbf{n}_k)} of distinct random natural numbers in {\{1,\dots,N\}} which multiply to a square, and which are reasonably uniformly distributed throughout {\{1,\dots,N\}}, in that each individual number {1 \leq n \leq N} is attained by one of the random variables {\mathbf{n}_i} with a probability of {O(1/N)}. If one can find such a distribution, then if the density of {A} is sufficienly close to {1}, it will happen with positive probability that each of the {\mathbf{n}_i} will lie in {A}, giving the claim.

When {k=3}, this strategy cannot work, as it contradicts the arguments of Erdös, Särközy, and Sós. The reason can be explained as follows. The most natural way to generate a triple {(\mathbf{n}_1,\mathbf{n}_2,\mathbf{n}_3)} of random natural numbers in {\{1,\dots,N\}} which multiply to a square is to set

\displaystyle  \mathbf{n}_1 := \mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{n}_2 := \mathbf{d}_{12} \mathbf{d}_{23}, \mathbf{n}_3 := \mathbf{d}_{13} \mathbf{d}_{23}

for some random natural numbers {\mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{d}_{23}}. But if one wants all these numbers to have magnitude {\asymp N}, one sees on taking logarithms that one would need

\displaystyle  \log \mathbf{d}_{12} + \log \mathbf{d}_{13}, \log \mathbf{d}_{12} + \log \mathbf{d}_{23}, \log \mathbf{d}_{13} + \log \mathbf{d}_{23} = \log N + O(1)

which by elementary linear algebra forces

\displaystyle  \log \mathbf{d}_{12}, \log \mathbf{d}_{13}, \log \mathbf{d}_{23} = \frac{1}{2} \log N + O(1),

so in particular each of the {\mathbf{n}_i} would have a factor comparable to {\sqrt{N}}. However, it follows from known results on the “multiplication table problem” (how many distinct integers are there in the {n \times n} multiplication table?) that most numbers up to {N} do not have a factor comparable to {\sqrt{N}}. (Quick proof: by the Hardy–Ramanujan law, a typical number of size {N} or of size {\sqrt{N}} has {(1+o(1)) \log\log N} factors, hence typically a number of size {N} will not factor into two factors of size {\sqrt{N}}.) So the above strategy cannot work for {k=3}.

However, the situation changes for larger {k}. For instance, for {k=4}, we can try the same strategy with the ansatz

\displaystyle \mathbf{n}_1 = \mathbf{d}_{12} \mathbf{d}_{13} \mathbf{d}_{14}; \quad \mathbf{n}_2 = \mathbf{d}_{12} \mathbf{d}_{23} \mathbf{d}_{24}; \quad \mathbf{n}_3 = \mathbf{d}_{13} \mathbf{d}_{23} \mathbf{d}_{34}; \quad \mathbf{n}_4 = \mathbf{d}_{14} \mathbf{d}_{24} \mathbf{d}_{34}.

Whereas before there were three (approximate) equations constraining three unknowns, now we would have four equations and six unknowns, and so we no longer have strong constraints on any of the {\mathbf{d}_{ij}}. So in principle we now have a chance to find a suitable random choice of the {\mathbf{d}_{ij}}. The most significant remaining obstacle is the Hardy–Ramanujan law: since the {\mathbf{n}_i} typically have {(1+o(1))\log\log N} prime factors, it is natural in this {k=4} case to choose each {\mathbf{d}_{ij}} to have {(\frac{1}{3}+o(1)) \log\log N} prime factors. As it turns out, if one does this (basically by requiring each prime {p \leq N^{\varepsilon^2}} to divide {\mathbf{d}_{ij}} with an independent probability of about {\frac{1}{3p}}, for some small {\varepsilon>0}, and then also adding in one large prime to bring the magnitude of the {\mathbf{n}_i} to be comparable to {N}), the calculations all work out, and one obtains the claimed result.

Earlier this year, I gave a series of lectures at the Joint Mathematics Meetings at San Francisco. I am uploading here the slides for these talks:

I also have written a text version of the first talk, which has been submitted to the Notices of the American Mathematical Society.

I have just uploaded to the arXiv my paper “Monotone non-decreasing sequences of the Euler totient function“. This paper concerns the quantity {M(x)}, defined as the length of the longest subsequence of the numbers from {1} to {x} for which the Euler totient function {\varphi} is non-decreasing. The first few values of {M} are

\displaystyle  1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 12, 12, \dots

(OEIS A365339). For instance, {M(6)=5} because the totient function is non-decreasing on the set {\{1,2,3,4,5\}} or {\{1,2,3,4,6\}}, but not on the set {\{1,2,3,4,5,6\}}.

Since {\varphi(p)=p-1} for any prime {p}, we have {M(x) \geq \pi(x)}, where {\pi(x)} is the prime counting function. Empirically, the primes come quite close to achieving the maximum length {M(x)}; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had

\displaystyle  M(x) = \pi(x)+64 \ \ \ \ \ (1)

for all {x \geq 31957}; this conjecture is verified up to {x=10^7}. The previous best known upper bound was basically of the form

\displaystyle  M(x) \leq \exp( (C+o(1)) (\log\log\log x)^2 ) \frac{x}{\log x} \ \ \ \ \ (2)

as {x \rightarrow \infty} for an explicit constant {C = 0.81781\dots}, from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic

\displaystyle  M(x) = \left( 1 + O \left(\frac{(\log\log x)^5}{\log x}\right) \right) \frac{x}{\log x}

so in particular {M(x) = (1+o(1))\pi(x)}. This answers a question of Erdős, as well as a closely related question of Pollack, Pomerance, and Treviño.

The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor {p} of a given number {1 \leq n \leq x} which has a sizeable influence on the totient function {\varphi(n)}. For instance, for “typical” numbers {n}, one has a factorization

\displaystyle  n = d p_2 p_1

where {p_2} is a medium sized prime, {p_1} is a significantly larger prime, and {d} is a number with all prime factors less than {p_2}. This leads to an approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} (1-\frac{1}{p_2}) n.

As a consequence, if we temporarily hold {d} fixed, and also localize {n} to a relatively short interval, then {\varphi} can only be non-decreasing in {n} if {p_2} is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if {p_2} is large; this can be formalized by partitioning the range of {p_2} into various subintervals and inspecting how this (and the monotonicity hypothesis on {\varphi}) constrains the values of {n} associated to each subinterval. When {p_2} is small, we instead use a factorization

\displaystyle  n = d p \ \ \ \ \ (3)

where {d} is very smooth (i.e., has no large prime factors), and {p} is a large prime. Now we have the approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} n \ \ \ \ \ (4)

and we can conclude that {\frac{\varphi(d)}{d}} will have to basically be piecewise constant in order for {\varphi} to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality

\displaystyle  \sum_{\frac{\varphi(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (5)

for all positive rational numbers {q}. This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with {\frac{\varphi(d)}{d}} equal to a fixed constant {q} (and omitting a few rare values of {n} where the approximation (4) is bad enough that {\varphi} is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of {\frac{\varphi(d)}{d}} in lowest terms necessarily equals the largest prime factor of {d}, that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of {\frac{1}{p-1}} for some primes {p} ranging up to the largest prime factor of {q}) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function {\sigma(n)}, one would need the analogue

\displaystyle  \sum_{\frac{\sigma(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (6)

of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.

In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.

Kevin Ford, Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “A lower bound on the mean value of the Erdős-Hooley delta function“. This paper complements the recent paper of Dimitris and myself obtaining the upper bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log\log x)^{11/4}

on the mean value of the Erdős-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+1} \}

In this paper we obtain a lower bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg (\log\log x)^{1+\eta-o(1)}

where {\eta = 0.3533227\dots} is an exponent that arose in previous work of result of Ford, Green, and Koukoulopoulos, who showed that

\displaystyle  \Delta(n) \gg (\log\log n)^{\eta-o(1)} \ \ \ \ \ (1)

for all {n} outside of a set of density zero. The previous best known lower bound for the mean value was

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg \log\log x,

due to Hall and Tenenbaum.

The point is the main contributions to the mean value of {\Delta(n)} are driven not by “typical” numbers {n} of some size {x}, but rather of numbers that have a splitting

\displaystyle  n = n' n''

where {n''} is the product of primes between some intermediate threshold {1 \leq y \leq x} and {x} and behaves “typically” (so in particular, it has about {\log\log x - \log\log y + O(\sqrt{\log\log x})} prime factors, as per the Hardy-Ramanujan law and the Erdős-Kac law, but {n'} is the product of primes up to {y} and has double the number of typical prime factors – {2 \log\log y + O(\sqrt{\log\log x})}, rather than {\log\log y + O(\sqrt{\log\log x})} – thus {n''} is the type of number that would make a significant contribution to the mean value of the divisor function {\tau(n'')}. Here {y} is such that {\log\log y} is an integer in the range

\displaystyle  \varepsilon\log\log x \leq \log \log y \leq (1-\varepsilon) \log\log x

for some small constant {\varepsilon>0} there are basically {\log\log x} different values of {y} give essentially disjoint contributions. From the easy inequalities

\displaystyle  \Delta(n) \gg \Delta(n') \Delta(n'') \geq \frac{\tau(n')}{\log n'} \Delta(n'') \ \ \ \ \ (2)

(the latter coming from the pigeonhole principle) and the fact that {\frac{\tau(n')}{\log n'}} has mean about one, one would expect to get the above result provided that one could get a lower bound of the form

\displaystyle  \Delta(n'') \gg (\log \log n'')^{\eta-o(1)} \ \ \ \ \ (3)

for most typical {n''} with prime factors between {y} and {x}. Unfortunately, due to the lack of small prime factors in {n''}, the arguments of Ford, Green, Koukoulopoulos that give (1) for typical {n} do not quite work for the rougher numbers {n''}. However, it turns out that one can get around this problem by replacing (2) by the more efficient inequality

\displaystyle  \Delta(n) \gg \frac{\tau(n')}{\log n'} \Delta^{(\log n')}(n'')

where

\displaystyle  \Delta^{(v)}(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+v} \}

is an enlarged version of {\Delta^{(n)}} when {v \geq 1}. This inequality is easily proven by applying the pigeonhole principle to the factors of {n} of the form {d' d''}, where {d'} is one of the {\tau(n')} factors of {n'}, and {d''} is one of the {\Delta^{(\log n')}(n'')} factors of {n''} in the optimal interval {[e^u, e^{u+\log n'}]}. The extra room provided by the enlargement of the range {[e^u, e^{u+1}]} to {[e^u, e^{u+\log n'}]} turns out to be sufficient to adapt the Ford-Green-Koukoulopoulos argument to the rough setting. In fact we are able to use the main technical estimate from that paper as a “black box”, namely that if one considers a random subset {A} of {[D^c, D]} for some small {c>0} and sufficiently large {D} with each {n \in [D^c, D]} lying in {A} with an independent probability {1/n}, then with high probability there should be {\gg c^{-1/\eta+o(1)}} subset sums of {A} that attain the same value. (Initially, what “high probability” means is just “close to {1}“, but one can reduce the failure probability significantly as {c \rightarrow 0} by a “tensor power trick” taking advantage of Bennett’s inequality.)

I have just uploaded to the arXiv my paper “The convergence of an alternating series of Erdős, assuming the Hardy–Littlewood prime tuples conjecture“. This paper concerns an old problem of Erdős concerning whether the alternating series {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} converges, where {p_n} denotes the {n^{th}} prime. The main result of this paper is that the answer to this question is affirmative assuming a sufficiently strong version of the Hardy–Littlewood prime tuples conjecture.

The alternating series test does not apply here because the ratios {\frac{n}{p_n}} are not monotonically decreasing. The deviations of monotonicity arise from fluctuations in the prime gaps {p_{n+1}-p_n}, so the enemy arises from biases in the prime gaps for odd and even {n}. By changing variables from {n} to {p_n} (or more precisely, to integers in the range between {p_n} and {p_{n+1}}), this is basically equivalent to biases in the parity {(-1)^{\pi(n)}} of the prime counting function. Indeed, it is an unpublished observation of Said that the convergence of {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} is equivalent to the convergence of {\sum_{n=10}^\infty \frac{(-1)^{\pi(n)}}{n \log n}}. So this question is really about trying to get a sufficiently strong amount of equidistribution for the parity of {\pi(n)}.

The prime tuples conjecture does not directly say much about the value of {\pi(n)}; however, it can be used to control differences {\pi(n+\lambda \log x) - \pi(n)} for {n \sim x} and {\lambda>0} not too large. Indeed, it is a famous calculation of Gallagher that for fixed {\lambda}, and {n} chosen randomly from {1} to {x}, the quantity {\pi(n+\lambda \log x) - \pi(n)} is distributed according to the Poisson distribution of mean {\lambda} asymptotically if the prime tuples conjecture holds. In particular, the parity {(-1)^{\pi(n+\lambda \log x)-\pi(n)}} of this quantity should have mean asymptotic to {e^{-2\lambda}}. An application of the van der Corput {A}-process then gives some decay on the mean of {(-1)^{\pi(n)}} as well. Unfortunately, this decay is a bit too weak for this problem; even if one uses the most quantitative version of Gallagher’s calculation, worked out in a recent paper of (Vivian) Kuperberg, the best bound on the mean {|\frac{1}{x} \sum_{n \leq x} (-1)^{\pi(n)}|} is something like {1/(\log\log x)^{-1/4+o(1)}}, which is not quite strong enough to overcome the doubly logarithmic divergence of {\sum_{n=1}^\infty \frac{1}{n \log n}}.

To get around this obstacle, we take advantage of the random sifted model {{\mathcal S}_z} of the primes that was introduced in a paper of Banks, Ford, and myself. To model the primes in an interval such as {[n, n+\lambda \log x]} with {n} drawn randomly from say {[x,2x]}, we remove one random residue class {a_p \hbox{ mod } p} from this interval for all primes {p} up to Pólya’s “magic cutoff” {z \approx x^{1/e^\gamma}}. The prime tuples conjecture can then be intepreted as the assertion that the random set {{\mathcal S}_z} produced by this sieving process is statistically a good model for the primes in {[n, n+\lambda \log x]}. After some standard manipulations (using a version of the Bonferroni inequalities, as well as some upper bounds of Kuperberg), the problem then boils down to getting sufficiently strong estimates for the expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}} of the random sifted set {{\mathcal S}_z}.

For this problem, the main advantage of working with the random sifted model, rather than with the primes or the singular series arising from the prime tuples conjecture, is that the sifted model can be studied iteratively from the partially sifted sets {{\mathcal S}_w} arising from sifting primes {p} up to some intermediate threshold {w<z}, and that the expected parity of the {{\mathcal S}_w} experiences some decay in {w}. Indeed, once {w} exceeds the length {\lambda \log x} of the interval {[n,n+\lambda \log x]}, sifting {{\mathcal S}_w} by an additional prime {p} will cause {{\mathcal S}_w} to lose one element with probability {|{\mathcal S}_w|/p}, and remain unchanged with probability {1 - |{\mathcal S}_w|/p}. If {|{\mathcal S}_w|} concentrates around some value {\overline{S}_w}, this suggests that the expected parity {{\bf E} (-1)^{|{\mathcal S}_w|}} will decay by a factor of about {|1 - 2 \overline{S}_w/p|} as one increases {w} to {p}, and iterating this should give good bounds on the final expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}}. It turns out that existing second moment calculations of Montgomery and Soundararajan suffice to obtain enough concentration to make this strategy work.

Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “An upper bound on the mean value of the Erdős-Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Erdos-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \Delta(n;u)

where

\displaystyle  \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.

The function {\Delta} measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, {e}-dyadic) interval {(e^u, e^{u+1}]}. From the pigeonhole principle, we have the bounds

\displaystyle  \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),

where {\tau(n) := \# \{ d: d|n\}} is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if {n} is drawn at random from {1} to {x}, then the mean value of {\tau(n)} is roughly {\log x}, the median is roughly {\log^{\log 2} x}, and (by the Erdős-Kac theorem) {\tau(n)} asymptotically has a log-normal distribution. In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Erdős-Hooley delta function is significantly less well understood, even conjecturally. Again drawing {n} at random from {1} to {x} for large {x}, the median is known to be somewhere between {(\log\log x)^{0.3533\dots}} and {(\log\log x)^{0.6102\dots}} for large {x} – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and La Bretèche and Tenenbaum (for the upper bound). And the mean {\frac{1}{x} \sum_{n \leq x} \Delta(n)} was even less well controlled; the best previous bounds were

\displaystyle  \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )

for any {c > \sqrt{2} \log 2}, with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of La Bretèche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no non-empty subset of {A} sums to a perfect square? Erdos observed that one can obtain sets of size {\gg N^{1/3}} (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of {\ll N^{1/3} (\log N)^{O(1)}}. With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to {\ll N^{1/3} (\log\log N)^{O(1)})}.

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number {n} factors as {n = d^2 m} with {m} squarefree, then {\Delta(n) \leq \tau(d^2) \Delta(m)}). Next, because a square-free number {n>1} can be uniquely factored as {n = pm} where {p} is a prime and {m} lies in the finite set {{\mathcal S}_{<p}} of squarefree numbers whose prime factors are less than {p}, and {\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}, it is not difficult to establish the bound

\displaystyle  \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \frac{\Delta(n)}{n}.

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)

We actually prove a slightly more refined distributional estimate: for any {A \geq 2}, we have a bound

\displaystyle  \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)

outside of an exceptional set {E} which is small in the sense that

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent {3/4} to {3/4+2 = 11/4}).

To get some intuition on the size of {\Delta(n)}, we observe that if {y > 0} and {n_{<y}} is the factor of {n} coming from the prime factors less than {y}, then

\displaystyle  \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)

On the other hand, standard estimates let one establish that

\displaystyle  \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)

for all {y}, and all {n} outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless {\log y} is close to {A^{\log 4}}, which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum {\Delta(n) = \sup_u \Delta(n;u)} by the moments

\displaystyle  M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du

for natural numbers {q}; it is not difficult to establish the bound

\displaystyle  \Delta(n) \ll M_q(n)^{1/q}

and one expects this bound to become essentially sharp once {q \sim \log\log x}. We will be able to show a moment bound

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A

for any {q \geq 2} for some exceptional set {E_q} obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on {q}); this will imply the distributional bound (2) from a standard Markov inequality argument (setting {q \sim \log\log x}).

The strategy is then to obtain a good recursive inequality for (averages of) {M_q(n)}. As in the reduction to (1), we factor {n=pm} where {p} is a prime and {m \in {\mathcal S}_{<p}}. One observes the identity

\displaystyle  \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)

for any {u}; taking moments, one obtains the identity

\displaystyle  M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

As in previous literature, one can try to average in {p} here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in {a,b} to reduce to the case of relatively small values of {b}:

\displaystyle  M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

One can extract out the {b=0} term as

\displaystyle  M_q(n) \leq 2 M_q(m)

\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

It is convenient to eliminate the factor of {2} by dividing out by the divisor function:

\displaystyle  \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}

\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

This inequality is suitable for iterating and also averaging in {p} and {m}. After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)

in terms of sums such as

\displaystyle  \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}

(assuming a certain monotonicity property of the exceptional set {E_q} that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on {M_b} (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}

for various {a < q}. This allows one to estimate these expressions efficiently by induction.

Tamar Ziegler and I have just uploaded to the arXiv our paper “Infinite partial sumsets in the primes“. This is a short paper inspired by a recent result of Kra, Moreira, Richter, and Robertson (discussed for instance in this Quanta article from last December) showing that for any set {A} of natural numbers of positive upper density, there exists a sequence {b_1 < b_2 < b_3 < \dots} of natural numbers and a shift {t} such that {b_i + b_j + t \in A} for all {i<j} this answers a question of Erdős). In view of the “transference principle“, it is then plausible to ask whether the same result holds if {A} is replaced by the primes. We can show the following results:

Theorem 1
  • (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence {b_1 < b_2 < b_3 < \dots} of primes such that {b_i + b_j + 1} is prime for all {i < j}.
  • (ii) Unconditionally, there exist increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i + b_j} is prime for all {i<j}.
  • (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).

We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but finite sequences {b_1 < \dots < b_k} of primes such that {b_i + b_j + 1} is prime for all {i < j \leq k}. (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i+b_j} is prime for all {i,j}.

The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given {k}, there exist infinitely many {k}-tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.

Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence {b_1 < b_2 < \dots} recursively by repeated application of the prime tuples conjecture.

The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples {(h_1,\dots,h_k)} for which there are infinitely many {n} with {n+h_1,\dots,n+h_k} all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence {h_1 < h_2 < \dots} such that every initial segment {(h_1,\dots,h_k)} is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:

Lemma 2 (Bergelson intersectivity lemma) Let {E_1,E_2,\dots} be subsets of a probability space {(X,\mu)} of measure uniformly bounded away from zero, thus {\inf_i \mu(E_i) > 0}. Then there exists a subsequence {E_{i_1}, E_{i_2}, \dots} such that

\displaystyle  \mu(E_{i_1} \cap \dots \cap E_{i_k} ) > 0

for all {k}.

This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from {X}, one can assume that all finite intersections {E_{i_1} \cap \dots \cap E_{i_k}} are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function {\limsup_N \frac{1}{N} \sum_{i=1}^N 1_{E_i}} has a positive integral, hence must be positive at some point {x_0}. Thus there is a subsequence {E_{i_1}, E_{i_2}, \dots} whose finite intersections all contain {x_0}, thus have positive measure as desired by the previous reduction.

It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events {E_i} that show up (which roughly correspond to the event that {n + h_i} is prime for some random number {n} (with a well-chosen probability distribution) and some shift {h_i}) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts {h_i} into various clusters {h_{i,1},\dots,h_{i,J_1}}, chosen in such a way that the probability that at least one of {n+h_{i,1},\dots,n+h_{i,J_1}} is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.

Archives