ITCS 2021 Registration

Head to http://itcs-conf.org/ to register for ITCS 2021, being held virtually at the Simons Institute from January 6-8, 2021. One can find the list of accepted papers at the same site, and the schedule of talks will be posted shortly. Live talks will occur from 8:30AM-3:30PM PST, and longer versions of the talks will be available to stream before the conference. Note that registration is free!

If you would like to have the ability to be seen/heard during the talk sessions, you will need to check the corresponding box on the registration form. Join us for the first virtual TCS conference of 2021!

James R. Lee, ITCS 2021 PC Chair

ITCS 2021 Call For Papers (deadline extension)

NOTE: The deadline has been extended to Tuesday, September 8th to account for the Labor Day Holiday in the US.

The 12th Innovations in Theoretical Computer Science (ITCS) conference will be held online from January 6-8, 2021. The submission deadline is now September 8, 2020.

The program committee encourages you to send your papers our way! See the call for papers for information about submitting to the conference.

ITCS seeks to promote research that carries a strong conceptual message (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

Important dates

  • Submission deadline: September 8, 2020 (05:59PM PDT)
  • Notification to authors: November 1, 2020
  • Conference dates: January 6-8, 2021

ITCS 2021 Call for Papers

The 12th Innovations in Theoretical Computer Science (ITCS) conference will be held online from January 6-8, 2021. The submission deadline is September 7, 2020.

The program committee encourages you to send your papers our way! See the call for papers for information about submitting to the conference.

ITCS seeks to promote research that carries a strong conceptual message (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

Important dates

  • Submission deadline: September 7, 2020 (05:59PM PDT)
  • Notification to authors: November 1, 2020
  • Conference dates: January 6-8, 2021

Competitive analysis, a new config

After not generating any new posts for a long time, I am happy to have finally switched platforms. The new site is hosted on gitpages at tcsmath.github.io. While the domain name tcsmath.org should point there, I am trying to work around well-known https certificate issues that cause Chrome to have a nervous breakdown.

The motivation for switching platforms is a course I’ve just started co-teaching with Seb Bubeck on Competitive analysis via convex optimization. The notes for the first lecture are live: Regret minimization and competitive analysis.

Follmer’s drift, Ito’s lemma, and the log-Sobolev inequality

1. Construction of Föllmer’s drift

In a previous post, we saw how an entropy-optimal drift process could be used to prove the Brascamp-Lieb inequalities. Our main tool was a result of Föllmer that we now recall and justify. Afterward, we will use it to prove the Gaussian log-Sobolev inequality.

Consider {f : \mathbb R^n \rightarrow \mathbb R_+} with {\int f \,d\gamma_n = 1} , where {\gamma_n} is the standard Gaussian measure on {\mathbb R^n} . Let {\{B_t\}} denote an {n} -dimensional Brownian motion with {B_0=0} . We consider all processes of the form

\displaystyle  W_t = B_t + \int_0^t v_s\,ds\,, \ \ \ \ \ (1)

where {\{v_s\}} is a progressively measurable drift and such that {W_1} has law {f\,d\gamma_n} .

Theorem 1 (Föllmer) It holds that

\displaystyle  D(f d\gamma_n \,\|\, d\gamma_n) = \min D(W_{[0,1]} \,\|\, B_{[0,1]}) = \min \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt\,,

where the minima are over all processes of the form (1).

Proof: In the preceding post (Lemma 2), we have already seen that for any drift of the form (1), it holds that

\displaystyle  D(f d\gamma_n \,\|\,d\gamma_n) \leq \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt = D(W_{[0,1]} \,\|\, B_{[0,1]})\,,

thus we need only exhibit a drift {\{v_t\}} achieving equality.

We define

\displaystyle  v_t = \nabla \log P_{1-t} f(W_t) = \frac{\nabla P_{1-t} f(W_t)}{P_{1-t} f(W_t)}\,,

where {\{P_t\}} is the Brownian semigroup defined by

\displaystyle  P_t f(x) = \mathop{\mathbb E}[f(x + B_t)]\,.

As we saw in the previous post (Lemma 2), the chain rule yields

\displaystyle  D(W_{[0,1]} \,\|\, B_{[0,1]}) = \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt\,. \ \ \ \ \ (2)

We are left to show that {W_1} has law {f \,d\gamma_n} and {D(W_{[0,1]} \,\|\, B_{[0,1]}) = D(f d\gamma_n \,\|\,d\gamma_n)} .

We will prove the first fact using Girsanov’s theorem to argue about the change of measure between {\{W_t\}} and {\{B_t\}} . As in the previous post, we will argue somewhat informally using the heuristic that the law of {dB_t} is a Gaussian random variable in {\mathbb R^n} with covariance {dt \cdot I} . Itô’s formula states that this heuristic is justified (see our use of the formula below).

The following lemma says that, given any sample path {\{W_s : s \in [0,t]\}} of our process up to time {s} , the probability that Brownian motion (without drift) would have “done the same thing” is {\frac{1}{M_t}} .

Remark 1 I chose to present various steps in the next proof at varying levels of formality. The arguments have the same structure as corresponding formal proofs, but I thought (perhaps naïvely) that this would be instructive.

Lemma 2 Let {\mu_t} denote the law of {\{W_s : s \in [0,t]\}} . If we define

\displaystyle  M_t = \exp\left(-\int_0^t \langle v_s,dB_s\rangle - \frac12 \int_0^t \|v_s\|^2\,ds\right)\,,

then under the measure {\nu_t} given by

\displaystyle  d\nu_t = M_t \,d\mu_t\,,

the process {\{W_s : s \in [0,t]\}} has the same law as {\{B_s : s \in [0,t]\}} .

Proof: We argue by analogy with the discrete proof. First, let us define the infinitesimal “transition kernel” of Brownian motion using our heuristic that {dB_t} has covariance {dt \cdot I} :

\displaystyle  p(x,y) = \frac{e^{-\|x-y\|^2/2dt}}{(2\pi dt)^{n/2}}\,.

We can also compute the (time-inhomogeneous) transition kernel {q_t} of {\{W_t\}} :

\displaystyle  q_t(x,y) = \frac{e^{-\|v_t dt + x - y\|^2/2dt}}{(2\pi dt)^{n/2}} = p(x,y) e^{-\frac12 \|v_t\|^2 dt} e^{-\langle v_t, x-y\rangle}\,.

Here we are using that {dW_t = dB_t + v_t\,dt} and {v_t} is deterministic conditioned on the past, thus the law of {dW_t} is a normal with mean {v_t\,dt} and covariance {dt \cdot I} .

To avoid confusion of derivatives, let’s use {\alpha_t} for the density of {\mu_t} and {\beta_t} for the density of Brownian motion (recall that these are densities on paths). Now let us relate the density {\alpha_{t+dt}} to the density {\alpha_{t}} . We use here the notations {\{\hat W_t, \hat v_t, \hat B_t\}} to denote a (non-random) sample path of {\{W_t\}} :

\displaystyle  \begin{array}{lll}  \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \alpha_t(\hat W_{[0,t]}) q_t(\hat W_t, \hat W_{t+dt}) \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt-\langle \hat v_t,\hat W_t-\hat W_{t+dt}\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t,d \hat W_t\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle}\,, \end{array}

where the last line uses {d\hat W_t = d\hat B_t + \hat v_t\,dt} .

Now by “heuristic” induction, we can assume {\alpha_t(\hat W_{[0,t]})=\frac{1}{M_t} \beta_t(\hat W_{[0,t]})} , yielding

\displaystyle  \begin{array}{lll}  \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \frac{1}{M_t} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle} \\ &= \frac{1}{M_{t+dt}} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) \\ &= \frac{1}{M_{t+dt}} \beta_{t+dt}(\hat W_{[0,t+dt]})\,. \end{array}

In the last line, we used the fact that {p} is the infinitesimal transition kernel for Brownian motion. \Box

Now we will show that

\displaystyle  P_{1-t} f(W_t) = \exp\left(\frac12 \int_0^t \|v_s\|^2\,ds + \int_0^t \langle v_s, dB_s\rangle\right) = \frac{1}{M_t}\,. \ \ \ \ \ (3)

From Lemma 2, it will follow that {W_t} has the law {(P_{1-t} f)\cdot d\nu_t} where {d\nu_t} is the law of {B_t} . In particular, {W_1} has the law {f\,d\nu_1 = f\,d\gamma_n} which was our first goal.

Given our preceding less formal arguments, let us use a proper stochastic calculus argument to establish (3). To do that we need a way to calculate

\displaystyle  d \log P_{1-t} f(W_t) \quad \textrm{``}= \log P_{1-t-dt} f(W_{t+dt}) - \log P_{1-t} f(W_t)\textrm{''} \ \ \ \ \ (4)

Notice that this involves both time and space derivatives.

Itô’s lemma. Suppose we have a continuously differentiable function {F : \mathbb R \times [0,1] \rightarrow \mathbb R} that we write as {F(x,t)} where {x} is a space variable and {t} is a time variable. We can expand {d F} via its Taylor series:

\displaystyle  d F = \partial_t F \,dt + \partial_x F\,dx + \frac12 \partial_x^2 F\,dx^2 + \frac12 \partial_x \partial_t F\,dx\,dt + \cdots\,.

Normally we could eliminate the terms {dx^2, dx\, dt} , etc. since they are lower order as {dx,dt \rightarrow 0} . But recall that for Brownian motion we have the heuristic {\mathop{\mathbb E}[dB_t^2]=dt} . Thus we cannot eliminate the second-order space derivative if we plan to plug in {x=B_t} (or {x=W_t} , a process driven by Brownian motion). Itô’s lemma says that this consideration alone gives us the correct result:

\displaystyle  d F(W_t,t) = \partial_t F(W_t,t)\,dt + \partial_x F(W_t,t)\,dW_t + \frac12 \partial_x^2 F(W_t,t)\,dt\,.

This generalizes in a straightforward way to the higher dimensional setting {F : \mathbb R^n \times [0,1] \rightarrow \mathbb R} .

With Itô’s lemma in hand, let us continue to calculate the derivative

\displaystyle  \begin{array}{lll}  d P_{1-t} f(W_t) &= - \Delta P_{1-t} f(W_t)\,dt + \langle \nabla P_{1-t} f(W_t), dW_t\rangle + \Delta P_{1-t} f(W_t) \,dt \\ &= \langle \nabla P_{1-t} f(W_t), dW_t\rangle \\ &= P_{1-t} f(W_t) \,\langle v_t, dW_t\rangle\,. \end{array}

For the time derivative (the first term), we have employed the heat equation

\displaystyle  \partial_t P_{1-t} f = - \Delta P_{1-t} f\,,

where {\Delta = \frac12 \sum_{i=1}^n \partial_{x_i}^2} is the Laplacian on {\mathbb R^n} .

Note that the heat equation was already contained in our “infinitesimal density” {p} in the proof of Lemma 2, or in the representation {P_t = e^{t \Delta}} , and Itô’s lemma was also contained in our heuristic that {dB_t} has covariance {dt \cdot I} .

Using Itô’s formula again yields

\displaystyle  d \log P_{1-t} f(W_t) = \langle v_t, dW_t\rangle - \frac12 \|v_t\|^2\,dt = \frac12 \|v_t\|^2\,dt + \langle v_t,dB_t\rangle\,.

giving our desired conclusion (3).

Our final task is to establish optimality: {D\left(W_{[0,1]} \,\|\, B_{[0,1]}\right) = D(W_1\,\|\,B_1)} . We apply the formula (3):

\displaystyle  D(W_1\,\|\,B_1) = \mathop{\mathbb E}[\log f(W_1)] = \mathop{\mathbb E}\left[\frac12 \int_0^1 \|v_t\|^2\,dt\right],

where we used {\mathop{\mathbb E}[\langle v_t,dB_t\rangle]=0} . Combined with (2), this completes the proof of the theorem. \Box

2. The Gaussian log-Sobolev inequality

Consider again a measurable {f : \mathbb R^n \rightarrow \mathbb R_+} with {\int f\,d\gamma_n=1} . Let us define {\mathrm{Ent}_{\gamma_n}(f) = D(f\,d\gamma_n \,\|\,d\gamma_n)} . Then the classical log-Sobolev inequality in Gaussian space asserts that

\displaystyle  \mathrm{Ent}_{\gamma_n}(f) \leq \frac12 \int \frac{\|\nabla f\|^2}{f}\,d\gamma_n\,. \ \ \ \ \ (5)

First, we discuss the correct way to interpret this. Define the Ornstein-Uhlenbeck semi-group {\{U_t\}} by its action

\displaystyle  U_t f(x) = \mathop{\mathbb E}[f(e^{-t} x + \sqrt{1-e^{-2t}} B_1)]\,.

This is the natural stationary diffusion process on Gaussian space. For every measurable {f} , we have

\displaystyle  U_t f \rightarrow \int f d\gamma_n \quad \textrm{ as } t \to \infty

or equivalently

\displaystyle  \mathrm{Ent}_{\gamma_n}(U_t f) \rightarrow 0 \quad \textrm{ as } t \to \infty

The log-Sobolev inequality yields quantitative convergence in the relative entropy distance as follows: Define the Fisher information

\displaystyle  I(f) = \int \frac{\|\nabla f\|^2}{f} \,d\gamma_n\,.

One can check that

\displaystyle  \frac{d}{dt} \mathrm{Ent}_{\gamma_n} (U_t f)\Big|_{t=0} = - I(f)\,,

thus the Fisher information describes the instantaneous decay of the relative entropy of {f} under diffusion.

So we can rewrite the log-Sobolev inequality as:

\displaystyle  - \frac{d}{dt} \mathrm{Ent}_{\gamma_n}(U_t f)\Big|_{t=0} \geq 2 \mathrm{Ent}_{\gamma_n}(f)\,.

This expresses the intuitive fact that when the relative entropy is large, its rate of decay toward equilibrium is faster.

Martingale property of the optimal drift. Now for the proof of (5). Let {dW_t = dB_t + v_t\,dt} be the entropy-optimal process with {W_1 \sim f \,d\gamma_n} . We need one more fact about {\{v_t\}} : The optimal drift is a martingale, i.e. {\mathop{\mathbb E}[v_t \mid v_s] = v_s} for {s < t} .

Let’s give two arguments to support this.

Argument one: Brownian bridges. First, note that by the chain rule for relative entropy, we have:

\displaystyle  D(W_{[0,1]} \,\|\, B_{[0,1]}) = D(W_1 \,\|\, B_1) + \int D(W_{[0,1]} \,\|\, B_{[0,1]} \mid W_1=B_1=x) f(x) d\gamma_n(x)\,.

But from optimality, we know that the latter expectation is zero. Therefore {f \,d\gamma_n} -almost surely, we have

\displaystyle  D(W_{[0,1]} \,\| B_{[0,1]} \mid W_1=B_1=x) = 0\,.

This implies that if we condition on the endpoint {x} , then {W_{[0,1]}} is a Brownian bridge (i.e., a Brownian motion conditioned to start at {0} and end at {x} ).

This implies that {\mathop{\mathbb E}[v_t \mid v_s, W_1=x] = v_s} , as one can check that a Brownian bridge {\{\hat B_t\}} with endpoint {x} is described by the drift process {d\hat B_t = dB_t + \frac{x-\hat B_t}{1-t}\,dt} , and

\displaystyle  \mathop{\mathbb E}\left[\frac{x-\hat B_t}{1-t} \,\Big|\, B_{[0,s]}\right] = \frac{x-\hat B_s}{1-s}\,.

That seemed complicated. There is a simpler way to see this: Given {\hat B_s} and any bridge {\gamma} from {\hat B_s} to {x} , every “permutation” of the infinitesimal steps in {\gamma} has the same law (by commutativity, they all land at {x} ). Thus the marginal law of {dB_t + v_t\,dt} at every point {t \geq s} should be the same. In particular,

\displaystyle  \mathop{\mathbb E}[v_t\,dt \mid v_s] = \mathop{\mathbb E}[dB_t + v_t\,dt \mid v_s] = \mathop{\mathbb E}[dB_s + v_s \,ds \mid v_s] = v_s\,ds\,.

Argument two: Change of measure. There is a more succinct (though perhaps more opaque) way to see that {\{v_t\}} is a martingale. Note that the process {\nabla P_{1-t} f(B_t) = P_{1-t} \nabla f(B_t)} is a Doob martingale. But we have {v_t = \frac{\nabla P_{1-t} f(W_t)}{P_{1-t} f(W_t)}} and we also know that {\frac{1}{P_{1-t} f(W_t)} = \frac{1}{M_t}} is precisely the change of measure that makes {\{W_t\}} into Brownian motion.

Proof of the log-Sobolev inequality. In any case, now we are ready for the proof of (5). It also comes straight from Lehec’s paper. Since {\{v_t\}} is a martingale, we have {\mathop{\mathbb E}\,\|v_t\|^2 \leq \mathop{\mathbb E}\,\|v_1\|^2} . So by Theorem 1:

\displaystyle  \mathrm{Ent}_{\gamma_n}(f) = \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt \leq \frac12 \mathop{\mathbb E}\,\|v_1\|^2 = \frac12 \mathop{\mathbb E}\, \frac{\|\nabla f(W_1)\|^2}{f(W_1)^2} = \frac12 \mathop{\mathbb E}\, \frac{\|\nabla f(B_1)\|^2}{f(B_1)}\,.

The latter quantity is \frac12 I(f)  . In the last equality, we used the fact that {\frac{1}{f(W_1)}} is precisely the change of measure that turns {\{W_t\}} into Brownian motion.

A geometric proof of Okamura-Seymour

After using it in the last post on non-positively curved surfaces, I thought it might be nice to give a simple proof of the Okamura-Seymour theorem in the dual setting. This argument arose out of conversations in 2007 with Amit Chakrabarti and former UW undergrad Justin Vincent. I was later informed by Yuri Rabinovich that he and Ilan Newman discovered a similar proof.

Note that establishing a node-capacitated version of the Okamura-Seymour theorem was an open question of Chekuri and Kawarabayashi. Resolving it positively is somewhat more difficult.

Theorem 1 Given a weighted planar graph {G = (V,E)} and a face {F} of {G} , there is a non-expansive embedding {\varphi:V\rightarrow L_1} such that {\varphi|_F} is an isometry.

By rational approximation and subdivision of edges, we may assume that {G} is unweighted. The following proof is constructive, and provides an explicit sequence of cuts on {G} whose characteristic functions form the coordinates of the embedding {\varphi} . Each such cut is obtained by applying the following lemma. Note that for a subset {S} of vertices, we use the notation {G/\partial S} , for the graph obtained from {G} by contracting the edges across {S}

Lemma 2 (Face-preserving cut lemma) Let {G = (V,E)} be a {2} -connected bipartite planar graph and {F} be the boundary of a face of {G} . There exists a cut {S\subseteq V} such that

\displaystyle \forall\, u,v\in F,~ d_G(u,v) - d_{G/\partial S}(u,v) ~=~ |\mathbf{1}_S(u) - \mathbf{1}_S(v)| \, .

Proof: Fix a plane embedding of {G} that makes {F} the boundary of the outer face. Since {G} is {2} -connected, {F} is a cycle, and since {G} is bipartite and has no parallel edges, {\mathrm{len}(F)\geq 4} .

Consider an arbitrary pair {(a,b)} of distinct vertices on {F} . There is a unique path from {a} to {b} that runs along {F} counterclockwise; call this path {\lambda_{ab}} . Consider a path {\gamma\in\mathcal P_{ab}} and a vertex {w\in V} . We say that {w} lies below {\gamma} if, in the plane embedding of {G} , {w} lies in the closed subset of the plane bounded by {\gamma} and {\lambda_{ab}} .

(Note that the direction of {\gamma} is significant in the definition of “lying below,” i.e., belowness with respect to a path in {\mathcal P_{ab}} is not the same as belowness with respect to the reverse of the same path in {\mathcal P_{ba}} .)

We say that {w} lies strictly below {\gamma} if {w} lies below {\gamma} and {w\notin\gamma} . We use this notion of “lying below” to define a partial order on the paths in {\mathcal P_{ab}} : for {\gamma,\gamma'\in\mathcal P_{ab}} we say that {\gamma'} is lower than {\gamma} if every vertex in {\gamma'} lies below {\gamma} .

We now fix the pair {(a,b)} and a path {\pi\in\mathcal P_{ab}} so that the following properties hold:

  1. {\mathrm{len}(\pi) = d_G(a,b) < \mathrm{len}(\lambda_{ab})} .
  2. If {a',b'\in\lambda_{ab}} are distinct vertices with {a'} preceding {b'} and {d_G(a',b') < \mathrm{len}(\lambda_{a'b'})} , then {a' = a} and {b' = b} .
  3. If {\pi'\in\mathcal P_{ab}} is lower than {\pi} and {\mathrm{len}(\pi') = \mathrm{len}(\pi)} , then {\pi' = \pi} .

OKS cut

Note that a suitable pair {(a,b)} exists because {\mathrm{len}(F)\ge4} . Finally, we define the cut {S\subseteq V} as follows: {S = \{v\in V:\, v} does not lie strictly below {\pi\}} .

Claim 1 The cut {S} satisfies the conditions of the lemma.

For the rest of this section, we fix the pair {(a,b)} , the path {\pi} and the cut {S} as defined in the above proof.

Lemma 1 Let {x} and {y} be distinct vertices on {\pi} and {\beta\in\mathcal P_{xy}} be such that {\emptyset\ne\mathrm{int}(\beta)\subseteq\bar S} . Then {\mathrm{len}(\beta) - 2 \ge d_G(x,y)} .

Proof: If {x = y} the lemma holds trivially, so assume {x \ne y} . Also assume without loss of generality that {x} precedes {y} in the path {\pi} . The conditions on {\beta} imply that all vertices in {\mathrm{int}(\beta)} lie strictly below {\pi} . Therefore, the path {\pi' := \pi[a:x]\circ\beta\circ\pi[y:b]} is lower than {\pi} and distinct from {\pi}

By property (3), we have {\mathrm{len}(\pi') > \mathrm{len}(\pi)} , which implies {\mathrm{len}(\beta) > \mathrm{len}(\pi[x:y])} . Since {G} is bipartite, the cycle formed by {\beta} and {\pi[x:y]} must have even length; therefore, {\mathrm{len}(\beta) \ge \mathrm{len}(\pi[x:y]) + 2 = d_G(x,y) + 2} . \Box

Lemma 2 Let {x} and {y} be distinct vertices on {\pi} and {\gamma\in\mathcal P_{xy}} . Then {\mathrm{len}(\gamma) - |\gamma\cap\partial S| \ge d_G(x,y)} .

Proof: If {\gamma\cap\partial S = \emptyset} there is nothing to prove. If not, we can write

\displaystyle \gamma ~=~ \alpha_1 \circ \beta_1 \circ \alpha_2 \circ \beta_2 \circ \cdots \circ \alpha_k \circ \beta_k \circ \alpha_{k+1}

for some {k\ge1} where, for all {i} , {\alpha_i\subseteq S} and {\emptyset\ne\mathrm{int}(\beta_i)\subseteq \bar S} . Let {x_i} and {y_i} denote the endpoints of {\beta_i} with {x_i} preceding {y_i} in the path {\gamma} . Further, define {y_0 := x} and {x_{k+1} := y} .

By Lemma 1, we have {d_G(x_i,y_i) \le \mathrm{len}(\beta_i) - 2} for {1\le i\le k} . Since {\pi} is a shortest path, we have {d_G(y_{i-1},x_i) \le \mathrm{len}(\alpha_i)} for {1\le i\le k+1} . Therefore

\displaystyle d_G(x,y) ~\le~ \sum_{i=1}^k d_G(x_i,y_i) + \sum_{i=1}^{k+1} d_G(y_{i-1},x_i) ~\le~ \sum_{i=1}^k (\mathrm{len}(\beta_i) - 2) + \sum_{i=1}^{k+1} \mathrm{len}(\alpha_i)\,.

The latter quantity is precisely {\mathrm{len}(\gamma) - 2k}  which completes the proof since {|\gamma\cap\partial S| = 2k} . \Box

Proof of Claim 1: Let {u,v\in F} be arbitrary and distinct. It is clear that {d_{G/\partial S}(u,v) \le d_G(u,v) - |\mathbf{1}_S(u) - \mathbf{1}_S(v)|} , so it suffices to prove the opposite inequality. We begin by observing that

\displaystyle d_{G/\partial S}(u,v) ~=~ \min\left\{ \mathrm{len}(\gamma) - |\gamma\cap\partial S|:\, \gamma\in\mathcal P_{uv}\right\} \, .

Let {\gamma} be a path in {\mathcal P_{uv}} that achieves the minimum in the above expression. First, suppose {\gamma\cap\partial S = \emptyset} . Then we must have {\mathbf{1}_S(u) = \mathbf{1}_S(v)} . Now, {d_{G/\partial S}(u,v) = \mathrm{len}(\gamma) \ge d_G(u,v)} , which implies {d_{G/\partial S}(u,v) = d_G(u,v)} and we are done.

Next, suppose {\gamma\cap\partial S \ne \emptyset} . Then, there exists at least one vertex in {\gamma} that lies on {\pi} . Let {x} be the first such vertex and {y} the last (according to the ordering in {\gamma} ) and assume that {x} precedes {y} in the path {\pi} . Let {\hat\gamma := \gamma[x:y]} . Note that {\hat\gamma} may be trivial, because we may have {x = y} . Now, {\gamma\cap\partial S = (\gamma[u:x]\cap\partial S) \cup (\hat\gamma\cap\partial S) \cup (\gamma[y:v]\cap\partial S)} , whence

\displaystyle |\gamma\cap\partial S| ~=~ (1 - \mathbf{1}_S(u)) + |\hat\gamma\cap\partial S| + (1 - \mathbf{1}_S(v)) \, . \ \ \ \ \ (1)

 

This gives

{\displaystyle \begin{array}{rcl} d_{G/\partial S}(u,v) & = & \mathrm{len}(\gamma) - (1 - \mathbf{1}_S(u)) - |\hat\gamma\cap\partial S| - (1 - \mathbf{1}_S(v)) \\ &\geq & \mathrm{len}(\gamma[u:x]) + \mathrm{len}(\hat\gamma) + \mathrm{len}(\gamma[y:v]) + (\mathbf{1}_S(u) + \mathbf{1}_S(v) - 2) - |\hat\gamma\cap\partial S| \nonumber \\ &\geq & \mathrm{len}(\gamma[u:x]) + d_G(x,y) + \mathrm{len}(\gamma[y:v]) + (\mathbf{1}_S(u) + \mathbf{1}_S(v) - 2) \, \ \ \ \ \ (**) \\ &\geq & d_G(u,v) + (\mathbf{1}_S(u) + \mathbf{1}_S(v) - 2) \end{array}}

where the first line follows from Eq. (1) and the definition of {\gamma} and the third line is obtained by applying Lemma 2 to the path {\hat\gamma} . If at least one of {u,v} lies in {S} , then {\mathbf{1}_S(u) + \mathbf{1}_S(v) - 2 = -|\mathbf{1}_S(u) - \mathbf{1}_S(v)|} and we are done.

Therefore, suppose {u,v\in\bar S} . Let {\beta = \lambda_{ab}} . For a path {\alpha} and vertices {w,z} on {\alpha} , let us use {|wz|_\alpha} as shorthand for {\mathrm{len}(\alpha[w:z])} . By property (1), we have {|ab|_\beta > |ab|_\pi} and since {G} is bipartite, this means {|ab|_\beta - |ab|_\pi \ge 2} . By property (2), we have {d_G(u,b) = |ub|_\beta} and {d_G(a,v) = |av|_\beta} . Using these facts, we now derive

\displaystyle \begin{array}{rcl} \mathrm{len}(\gamma[u:x]) + d_G(x,y) + \mathrm{len}(\gamma[y:v]) & = & |ux|_\gamma + |xy|_\pi + |yv|_\gamma \\ & = & |ux|_\gamma + |xb|_\pi + |ay|_\pi + |yv|_\gamma - |ab|_\pi \\ &\ge& d_G(u,b) + d_G(a,v) - |ab|_\pi \\ & = & |ub|_\beta + |av|_\beta - |ab|_\pi \\ & = & |uv|_\beta + |ab|_\beta - |ab|_\pi \\ &\ge& |uv|_\beta + 2 \, . \end{array}

Using this in (**) above and noting that {\mathbf{1}_S(u) = \mathbf{1}_S(v) = 0} , we get {d_{G/\partial S}(u,v) \ge |uv|_\beta + 2 - 2 = d_G(u,v)} . This completes the proof. \Box

Proof of Theorem 1: Assume that {G} is {2} -connected. We may also assume that {G} is bipartite. To see why, note that subdividing every edge of {G} by introducing one new vertex per edge leaves the metric {d_G} essentially unchanged except for a scaling factor of {2} .

We shall now prove the stronger statement that for every face {f} of {G} there exists a sequence of cuts {S_1,S_2,\ldots,S_k} of {G} such that for all {u,v} on {f} , we have {d_G(u,v) = \sum_{i=1}^k |\mathbf{1}_{S_i}(u) - \mathbf{1}_{S_i}(v)|} and that for {i\ne j} , {\partial S_i\cap\partial S_j = \emptyset} . We prove this by induction on {|V|} .

The result is trivial in the degenerate case when {G} is a single edge. For any larger {G} and any cut {S\subseteq V} , the graph {G/\partial S} is either a single edge or is {2} -connected. Furthermore, contracting a cut preserves the parities of the lengths of all closed walks; therefore {G/\partial S} is also bipartite.

Apply the face-preserving cut lemma (Lemma 1) to obtain a cut {S_1\subseteq V} . By the above observations, we can apply the induction hypothesis to {G/\partial S_1} to obtain cuts {\hat S_2,\ldots,\hat S_k} of {G/\partial S_1} corresponding to the image of {f} in {G/\partial S_1} . Each cut {\hat S_i} induces a cut {S_i} of {G} . Clearly {\partial S_1\cap\partial S_i = \emptyset} for any {i > 1} . Finally, for any {u,v\in f} , we have

{\displaystyle d_G(u,v) = d_{G/\partial S_1}(u,v) + |\mathbf{1}_{S_1}(u) - \mathbf{1}_{S_1}(v)| = \sum_{i=2}^k |\mathbf{1}_{S_i}(u) - \mathbf{1}_{S_i}(v)| + |\mathbf{1}_{S_1}(u) - \mathbf{1}_{S_1}(v)|}

where the first equality follows from the property of {S_1} and the second follows from the induction hypothesis. This proves the theorem. \Box