[This post is dedicated to Luca Trevisan, who recently passed away due to cancer. Though far from his most significant contribution to the field, I would like to mention that, as with most of my other blog posts on this site, this page was written with the assistance of Luca’s LaTeX to WordPress converter. Mathematically, his work and insight on pseudorandomness in particular have greatly informed how I myself think about the concept. – T.]

Recently, Timothy Gowers, Ben Green, Freddie Manners, and I were able to establish the following theorem:

Theorem 1 (Marton’s conjecture) Let {A \subset {\bf F}_2^n} be non-empty with {|A+A| \leq K|A|}. Then there exists a subgroup {H} of {{\bf F}_2^n} with {|H| \leq |A|} such that {A} is covered by at most {2K^C} translates of {H}, for some absolute constant {C}.

We established this result with {C=12}, although it has since been improved to {C=9} by Jyun-Jie Liao.

Our proof was written in order to optimize the constant {C} as much as possible; similarly for the more detailed blueprint of the proof that was prepared in order to formalize the result in Lean. I have been asked a few times whether it is possible to present a streamlined and more conceptual version of the proof in which one does not try to establish an explicit constant {C}, but just to show that the result holds for some constant {C}. This is what I will attempt to do in this post, though some of the more routine steps will be outsourced to the aforementioned blueprint.

The key concept here is that of the entropic Ruzsa distance {d[X;Y]} between two random variables {X,Y} taking values {{\bf F}_2^n}, defined as

\displaystyle  d[X;Y] := {\mathbf H}[X'+Y'] - \frac{1}{2} {\mathbf H}[X] - \frac{1}{2} {\mathbf H}[Y]

where {X',Y'} are independent copies of {X,Y}, and {{\mathbf H}[X]} denotes the Shannon entropy of {X}. This distance is symmetric and non-negative, and obeys the triangle inequality

\displaystyle  d[X;Z] \leq d[X;Y] + d[Y;Z]

for any random variables {X,Y,Z}; see the blueprint for a proof. The above theorem then follows from an entropic analogue:

Theorem 2 (Entropic Marton’s conjecture) Let {X} be a {{\bf F}_2^n}-valued random variable with {d[X;X] \leq \log K}. Then there exists a uniform random variable {U_H} on a subgroup {H} of {{\bf F}_2^n} such that {d[X; U_H] \leq C \log K} for some absolute constant {C}.

We were able to establish Theorem 2 with {C=11}, which implies Theorem 1 with {C=12} by fairly standard additive combinatorics manipulations (such as the Ruzsa covering lemma); see the blueprint for details.

The key proposition needed to establish Theorem 2 is the following distance decrement property:

Proposition 3 (Distance decrement) If {X,Y} are {{\bf F}_2^n}-valued random variables, then one can find {{\bf F}_2^n}-valued random variables {X',Y'} such that

\displaystyle  d[X';Y'] \leq (1-\eta) d[X;Y]

and

\displaystyle  d[X;X'], d[Y,Y'] \leq C d[X;Y]

for some absolute constants {C, \eta > 0}.

Indeed, suppose this proposition held. Starting with {X,Y} both equal to {X} and iterating, one can then find sequences of random variables {X_n, Y_n} with {X_0=Y_0=X},

\displaystyle  d[X_n;Y_n] \leq (1-\eta)^n d[X;X],

and

\displaystyle  d[X_{n+1};X_n], d[Y_{n+1};Y_n] \leq C (1-\eta)^n d[X;X].

In particular, from the triangle inequality and geometric series

\displaystyle  d[X_n;X], d[Y_n;X] \leq \frac{C}{\eta} d[X;X].

By weak compactness, some subsequence of the {X_n}, {Y_n} converge to some limiting random variables {X_\infty, Y_\infty}, and by some simple continuity properties of entropic Ruzsa distance, we conclude that

\displaystyle  d[X_\infty;Y_\infty] = 0

and

\displaystyle  d[X_\infty;X], d[Y_\infty;X] \leq \frac{C}{\eta} d[X;X].

Theorem 2 then follows from the “100% inverse theorem” for entropic Ruzsa distance; see the blueprint for details.

To prove Proposition 3, we can reformulate it as follows:

Proposition 4 (Lack of distance decrement implies vanishing) If {X,Y} are {{\bf F}_2^n}-valued random variables, with the property that

\displaystyle  d[X';Y'] > d[X;Y] - \eta ( d[X;Y] + d[X';X] + d[Y',Y] ) \ \ \ \ \ (1)

for all {{\bf F}_2^n}-valued random variables {X',Y'} and some sufficiently small absolute constant {\eta > 0}, then one can derive a contradiction.

Indeed, we may assume from the above proposition that

\displaystyle  d[X';Y'] \leq d[X;Y] - \eta ( d[X; Y] + d[X';X] + d[Y',Y] )

for some {X',Y'}, which will imply Proposition 3 with {C = 1/\eta}.

The entire game is now to use Shannon entropy inequalities and “entropic Ruzsa calculus” to deduce a contradiction from (1) for {\eta} small enough. This we will do below the fold, but before doing so, let us first make some adjustments to (1) that will make it more useful for our purposes. Firstly, because conditional entropic Ruzsa distance (see blueprint for definitions) is an average of unconditional entropic Ruzsa distance, we can automatically upgrade (1) to the conditional version

\displaystyle  d[X'|Z;Y'|W] \geq d[X;Y] - \eta ( d[X;Y] + d[X'|Z;X] + d[Y'|W;Y] )

for any random variables {Z,W} that are possibly coupled with {X',Y'} respectively. In particular, if we define a “relevant” random variable {X'} (conditioned with respect to some auxiliary data {Z}) to be a random variable for which

\displaystyle  d[X'|Z;X] = O( d[X;Y] )

or equivalently (by the triangle inequality)

\displaystyle  d[X'|Z;Y] = O( d[X;Y] )

then we have the useful lower bound

\displaystyle  d[X'|Z;Y'|W] \geq (1-O(\eta)) d[X;Y] \ \ \ \ \ (2)

whenever {X'} and {Y'} are relevant conditioning on {Z, W} respectively. This is quite a useful bound, since the laws of “entropic Ruzsa calculus” will tell us, roughly speaking, that virtually any random variable that we can create from taking various sums of copies of {X,Y} and conditioning against other sums, will be relevant. (Informally: the space of relevant random variables is {(1-O(\eta))d[X;Y]}-separated with respect to the entropic Ruzsa distance.)

— 1. Main argument —

Now we derive more and more consequences of (2) – at some point crucially using the hypothesis that we are in characteristic two – before we reach a contradiction.

Right now, our hypothesis (2) only supplies lower bounds on entropic distances. The crucial ingredient that allows us to proceed is what we call the fibring identity, which lets us convert these lower bounds into useful upper bounds as well, which in fact match up very nicely when {\eta} is small. Informally, the fibring identity captures the intuitive fact that the doubling constant of a set {A} should be at least as large as the doubling constant of the image {\pi(A)} of that set under a homomorphism, times the doubling constant of a typical fiber {A \cap \pi^{-1}(\{z\})} of that homomorphism; and furthermore, one should only be close to equality if the fibers “line up” in some sense.

Here is the fibring identity:

Proposition 5 (Fibring identity) Let {\pi: G \rightarrow H} be a homomorphism. Then for any independent {G}-valued random variables {X, Y}, one has

\displaystyle  d[X;Y] = d[\pi(X); \pi(Y)] + d[X|\pi(X); Y|\pi(Y)]

\displaystyle  + I[X-Y : \pi(X),\pi(Y) | \pi(X)-\pi(Y) ].

The proof is of course in the blueprint, but given that it is a central pillar of the argument, I reproduce it here.

Proof: Expanding out the definition of Ruzsa distance, and using the conditional entropy chain rule

\displaystyle  {\mathbf H}[X] = {\mathbf H}[\pi(X)] + {\mathbf H}[X|\pi(X)]

and

\displaystyle  {\mathbf H}[Y] = {\mathbf H}[\pi(Y)] + {\mathbf H}[Y|\pi(Y)],

it suffices to establish the identity

\displaystyle  {\mathbf H}[X-Y] = {\mathbf H}[\pi(X)-\pi(Y)] + {\mathbf H}[X - Y|\pi(X), \pi(Y)]

\displaystyle  + I[X-Y : \pi(X),\pi(Y) | \pi(X)-(Y) ].

But from the chain rule again we have

\displaystyle  {\mathbf H}[X-Y] = {\mathbf H}[\pi(X)-\pi(Y)] + {\mathbf H}[X - Y|\pi(X)-\pi(Y)]

and from the definition of conditional mutual information (using the fact that {\pi(X)-\pi(Y)} is determined both by {X-Y} and by {(\pi(X),\pi(Y))}) one has

\displaystyle  {\mathbf H}[X - Y|\pi(X)-\pi(Y)] = {\mathbf H}[X - Y|\pi(X), \pi(Y)]

\displaystyle  + I[X-Y : \pi(X),\pi(Y) | \pi(X)-(Y) ]

giving the claim. \Box

We will only care about the characteristic {2} setting here, so we will now assume that all groups involved are {2}-torsion, so that we can replace all subtractions with additions. If we specialize the fibring identity to the case where {G = {\bf F}_2^n \times {\bf F}_2^n}, {H = {\bf F}_2^n}, {\pi: G \rightarrow H} is the addition map {\pi(x,y) = x+y}, and {X = (X_1, X_2)}, {Y = (Y_1, Y_2)} are pairs of independent random variables in {{\bf F}_2^n}, we obtain the following corollary:

Corollary 6 Let {X_1,X_2,Y_1,Y_2} be independent {{\bf F}_2^n}-valued random variables. Then we have the identity

\displaystyle  d[X_1;Y_1] + d[X_2;Y_2] = d[X_1+X_2;Y_1+Y_2]

\displaystyle  + d[X_1|X_1+X_2;Y_1|Y_1+Y_2]

\displaystyle  + I[(X_1+Y_1, X_2+Y_2) : (X_1+X_2,Y_1+Y_2) | X_1+X_2+Y_1+Y_2 ].

This is a useful and flexible identity, especially when combined with (2). For instance, we can discard the conditional mutual information term as being non-negative, to obtain the inequality

\displaystyle  d[X_1;Y_1] + d[X_2;Y_2] \geq d[X_1+X_2;Y_1+Y_2]

\displaystyle  + d[X_1|X_1+X_2;Y_1|Y_1+Y_2].

If we let {X_1, Y_1, X_2, Y_2} be independent copies of {X, Y, Y, X} respectively (note the swap in the last two variables!) we obtain

\displaystyle  2 d[X;Y] \geq d[X+Y;X+Y] + d[X_1|X_1+X_2;Y_1|Y_1+Y_2].

From entropic Ruzsa calculus, one can check that {X+Y}, {X_1|X_1+X_2}, and {Y_1|Y_1+Y_2} are all relevant random variables, so from (2) we now obtain both upper and lower bounds for {d[X+Y;X+Y]}:

\displaystyle  d[X+Y; X+Y] = (1 + O(\eta)) d[X;Y].

A pleasant upshot of this is that we now get to work in the symmetric case {X=Y} without loss of generality. Indeed, if we set {X^* := X+Y}, we now have from (2) that

\displaystyle  d[X'|Z; Y'|W] \geq (1-O(\eta)) d[X^*;X^*] \ \ \ \ \ (3)

whenever {X'|Z, Y'|W} are relevant, which by entropic Ruzsa calculus is equivalent to asking that

\displaystyle  d[X'|Z; X^*], d[Y'|W; X^*] = O(d[X^*;X^*]).

Now we use the fibring identity again, relabeling {Y_1,Y_2} as {X_3,X_4} and requiring {X_1,X_2,X_3,X_4} to be independent copies of {X^*}. We conclude that

\displaystyle  2d[X^*; X^*] = d[X_1+X_2;X_3+Y_4] + d[X_1|X_1+X_2;X_3|X_1+X_4]

\displaystyle  + I[(X_1+X_3, X_2+X_4) : (X_1+X_2,X_3+X_4) | X_1+X_2+X_3+X_4 ].

As before, the random variables {X_1+X_2}, {X_3+X_4}, {X_1|X_1+X_2}, {X_3|X_3+X_4} are all relevant, so from (3) we have

\displaystyle  d[X_1+X_2;X_3+X_4], d[X_1|X_1+X_2;X_3|X_1+X_4]

\displaystyle  \geq (1-O(\eta)) d[X^*;X^*].

We could now also match these lower bounds with upper bounds, but the more important takeaway from this analysis is a really good bound on the conditional mutual information:

\displaystyle  I[(X_1+X_3, X_2+X_4) : (X_1+X_2,X_3+X_4) | X_1+X_2+X_3+X_4 ]

\displaystyle = O(\eta) d[X^*;X^*].

By the data processing inequality, we can discard some of the randomness here, and conclude

\displaystyle  I[X_1+X_3 : X_1+X_2 | X_1+X_2+X_3+X_4 ] = O(\eta) d[X^*;X^*].

Let us introduce the random variables

\displaystyle  S := X_1+X_2+X_3+X_4; U := X_1+X_2; V = X_1 + X_3

then we have

\displaystyle  I[U : V | S] = O(\eta) d[X^*;X^*].

Intuitively, this means that {U} and {V} are very nearly independent given {S}. For sake of argument, let us assume that they are actually independent; one can achieve something resembling this by invoking the entropic Balog-Szemerédi-Gowers theorem, established in the blueprint, after conceding some losses of {O(\eta) d[X^*,X^*]} in the entropy, but we skip over the details for this blog post. The key point now is that because we are in characteristic {2}, {U+V} has the same form as {U} or {V}:

\displaystyle  U + V = X_2 + X_3.

In particular, by permutation symmetry, we have

\displaystyle  {\mathbf H}[U+V|S] ={\mathbf H}[U|S] ={\mathbf H}[V|S],

and so by the definition of conditional Ruzsa distance we have a massive distance decrement

\displaystyle  {\bf E}_s d[U|S=s; V|S=s] = 0,

(where {s} is drawn from the distribution of {S}), contradicting (1) as desired. (In reality, we end up decreasing the distance not all the way to zero, but instead to {O(\eta d[X^*,X^*])} due to losses in the Balog-Szemerédi-Gowers theorem, but this is still enough to reach a contradiction. The quantity {{\bf E}_s d[U|S=s; V|S=s]} is very similar to {d[U|S; V|S]}, but is slightly different; the latter quantity is {{\bf E}_{s,s'}d[U|S=s; V|S=s']}.)

Remark 7 A similar argument works in the {m}-torsion case for general {m}. Instead of decrementing the entropic Ruzsa distance, one instead decrements a “multidistance”

\displaystyle  {\mathbf H}[X_1 + \dots + X_m] - \frac{1}{m} ({\mathbf H}[X_1] + \dots + {\mathbf H}[X_m])

for independent {X_1,\dots,X_m}. By an iterated version of the fibring identity, one can first reduce again to the symmetric case where the random variables are all copies of the same variable {X^*}. If one then takes {X_{i,j}}, {i,j=1,\dots,m} to be an array of {m^2} copies of {X^*}, one can get to the point where the row sums {\sum_i X_{i,j}} and the column sums {\sum_j X_{i,j}} have small conditional mutual information with respect to the double sum {S := \sum_i \sum_j X_{i,j}}. If we then set {U := \sum_i \sum_j j X_{i,j}} and {V := \sum_i \sum_j i X_{i,j}}, the data processing inequality again shows that {U} and {V} are nearly independent given {S}. The {m}-torsion now crucially intervenes as before to ensure that {U+V = \sum_i \sum_j (i+j) X_{i,j}} has the same form as {U} or {V}, leading to a contradiction as before. See this previous blog post for more discussion.