From model theory to algebraic geometry and back

Quantifier Elimination

Quantifier elimination is a staple of model theory. Suppose that {\phi(X,Y)} is a quantifier-free formula in some language {L}. Then one says a quantifier-free formula {\psi(Y)} is equivalent modulo a {L}-theory {T} to the (quantified ) formula {(\exists X) \phi(X,Y)}, if in every model {\mathcal{M}} of {T}, {\mathcal{M} \models (\forall Y) ((\exists X) \phi(X;Y) \Leftrightarrow \psi(Y))}.

Geometrically speaking, eliminating existential quantifiers corresponds to taking image under coordinate projection. The statement “{\mathcal{M} \models (\forall Y) ((\exists X) \phi(X;Y) \Leftrightarrow \psi(Y))}” is equivalent to “{\pi_Y(\phi(\mathcal{M})) = \psi(\mathcal{M})}”, where {\pi_Y: \mathcal{M}^{|X|} \times \mathcal{M}^{|Y|} \rightarrow \mathcal{M}^{|Y|}} is the projection map to the “{Y}-coordinates”, and we have used the standard model-thoretic notation to denote {\phi(\mathcal{M}) = { (x,y) \in \mathcal{M}^{|X|} \times \mathcal{M}^{|Y|} \mid \mathcal{M} \models \phi(x,y)}}, and {\psi(\mathcal{M}) = { y \in \mathcal{M}^{|Y|} \mid \mathcal{M} \models \psi(y)}}.


All the above is clear from the logical perspective. The trouble arises when one works at the boundary of model theory and algebraic geometry, and would like to interpret the above definitions in the language of algebraic geometry.

Schematic image, constructible sets and Chevalley’s theorem

Suppose {k} is a field. Denote by {\mathbb{A}^m_k = \mathrm{Spec}(k[X_1,\ldots,X_m])}, the {m}-dimensional affine space over {k}, and let {\pi: \mathbb{A}^m_k \times \mathbb{A}^n_k \rightarrow \mathbb{A}^n_k} the scheme-theoretic morphism, corresponding to the inclusion

\displaystyle k[Y_1,\ldots,Y_n] \hookrightarrow k[X_1,\ldots,X_m,Y_1,\ldots,Y_n] \cong k[X_1,\ldots,X_m] \otimes_k k[Y_1,\ldots,Y_n].


One calls a subset of the underlying topological space (Zariski topology) of the scheme {\mathbb{A}^n_k} to be constructible, if it is a finite union of subsets of the form {X - Y} where {X,Y} are the underlying topological spaces of closed subschemes of {\mathbb{A}^n_k}.


Now if {V \subset \mathbb{A}^m_k \times \mathbb{A}^n_k} is a subscheme, then its schematic image, {\pi(V)}, is a well defined subset of the underlying topological space of {\mathbb{A}^n_k}. What does one mean by the schematic image ? Suppose {\mathfrak{p} \in \mathrm{Spec}(k[Y_1,\ldots,Y_n]) = \mathbb{A}^n_k} is a point of {\mathbb{A}^n_k}.


Let {\pi|_{V}:V \rightarrow \mathbb{A}^n_k } denote the restriction to {V} of the morphism {\pi}, i.e. {\pi|_{V}} corresponds to the composition

\displaystyle f:k[Y_1,\ldots,Y_n] \hookrightarrow k[X_1,\ldots,X_m,Y_1,\ldots,Y_n] \twoheadrightarrow k[V].

Then, {\mathfrak{p}} belongs to the schematic image {\pi(V)} if and only if {\mathfrak{p} = f^{-1}(\mathfrak{q})} for some {\mathfrak{q} \in \mathrm{Spec}(k[V]}. (The schematic image of {V} only depends on the underlying topological space of {V}, and thus we can replace {V} by the corresponding reduced scheme {V_{\mathrm{red}}} ({V_{\mathrm{red}}} is the smallest closed subscheme of {\mathbb{A}^m_k \times \mathbb{A}^n_k} whose underlying space is equal to that of {V}) in the above description without changing {\pi(V)}.)


If {V,Z} are closed subschemes of {\mathbb{A}^m_k \times \mathbb{A}^n_k}, then so is {V \cap Z}. We define the schematic image, {\pi(V - Z)} of the constructible set {V - Z} to be {\pi(V) - \pi(V \cap Z)}, and more generally we define the schematic image of {\pi( \bigcup_{i=1}^k (V_i - Z_i))} by {\bigcup_{i=1}^k \pi(V_i - Z_i)}.

Example 1


The following standard example is helpful. Let {V} be the affine subscheme of {\mathbb{A}^2} defined by the single polynomial {XY - 1} i.e. {V = \mathrm{Spec} \; k[X,Y]/(XY -1)}. Let {\pi: \mathbb{A}^2 \rightarrow \mathbb{A}^1} be the projection (see picture).


Then each closed point {\mathfrak{p} \in \mathbb{A}^1_k = \mathrm{Spec}\; k[Y]}, other than the point {(Y)} is contained in the schematic image of {\pi|_V}. For example, {(Y-1) \in \mathrm{Spec} \ k[Y]} is in the image since {\overline{(Y-1)} \in \mathrm{Spec}\ k[X,Y]/(XY -1)}. However, there is no prime ideal {\mathfrak{q}} of {k[X,Y]/(XY -1)} whose inverse image under the composition {k[Y] \hookrightarrow k[X,Y] \twoheadrightarrow k[X,Y](XY -1)} is equal to {(Y)}. Note that the ideal generated by {\overline{Y}} in {k[X,Y]/(XY -1)} contains {\overline{1}} (i.e. corresponds to the generic element of {V}), and its inverse image in {k[Y]} is {k[Y]} itself, which corresponds to the generic element of {\mathbb{A}^1_k} (confirming the intuition that the generic element of {V} maps to the generic element of {\mathbb{A}^1}). But the schematic map {\pi|_V} is not surjective, since the closed point corresponding to the ideal {(Y)} is not in the image.

Chevalley’s theorem

Chevalley’s theorem states that the schematic image of a constructible set (as defined previously) is again constructible.

What’s the connection with model theory ?

There is the obvious temptation to identify constructible subsets of {\mathbb{A}^n_k} with definable subsets of {k^n}, for {k} a model of some theory {T} which extends the theory of fields. If {T} is the theory of algebraically closed fields, then {k} is algebraically closed, and each definable subset of {k^n} can be identified with the set of closed points of some constructible subset of {\mathbb{A}_k^n}. In this case the closed points also correspond to the {k}-rational points — and so the each definable subset of {k^n} is of the form, {\bigcup_{i=1}^{k} (V_i(k) - Z_i(k))}, for closed subschemes {V_i,Z_i} of {\mathbb{A}_k^n}. Chevalley’s theorem then immediately implies in particular that the theory of algebraically closed fields admits quantifier elimination.

1. What about non-algebraically closed fields ?

What if one considers some theory (still in the language of fields) of fields (for example, the theory of real closed fields, or the theory of {\mathbb{Q}_p}) which are not algebraically closed. Chevalley’s theorem still holds. But what breaks down now is that the definable subsets of {k^n} are no longer just the sets of the form {\bigcup_{i=1}^{k} (V_i(k) - Z_i(k))}, for closed subschemes {V_i,Z_i}. The class of the former can be strictly bigger. Moreover, even if one restricts to definable subsets of {k^m \times k^n} of the form {\bigcup_{i=1}^{k} (V_i(k) - Z_i(k))}, for closed subschemes {V_i,Z_i}, the {k}-points of the schematic image {\bigcup_{i=1}^{k} \pi(V_i - Z_i)} can be strictly larger than the image under the set-theoretic projection of the definable subset under consideration.

Example 2

Take for example the definable subset of {\mathbb{R} \times \mathbb{R}} defined by the one equation {X^2 + Y^2 -1 =0}. The image of the set theoretic projection to the {Y}-axis of the {\mathbb{R}}-points of the affine subscheme {V \subset \mathbb{A}^2} defined by {X^2+Y^2 -1} is obviously the real interval {[-1,1]} (which note is not a definable subset of {{\mathbb R}}). But if one takes the schematic image {\pi(V)}, and then considers the {\mathbb{R}}-points of {\mathbb{A}^1} contained in {\pi(V)}, then we get all of {\mathbb{R}}.


Thus, taking the {k}-points “upstairs” and then the set theoretic image, can produce a strictly smaller set, than first taking the schematic image and then the {k}-points of the image “downstairs”.


The above example shows that the theory of real closed fields does not admit quantifier elimination. The image under projection of a definable subset need not be definable. One can recover quantifier elimination by expanding the language (and thus also the definable subsets). For example, clearly one must include the subset {[-1,1] \subset \mathbb{R}^1} which occurs in the example above. One way is to expand the language and include the predicate {\mathbf{P}_2(X) := (\exists Y) (X = Y^2)} (often, abbreviated to {(X \geq 0)} (!)). The resulting theory in the expanded language does have quantifier elimination. This is the Tarski-Seidenberg theorem — while being still elementary — is harder to prove than Chevalley’s theorem. In the case of the theory of {\mathbb{Q}_p}, in order to get quantifier elimination one has to go further and include all the predicates {\mathbf{P}_n(X) := (\exists Y) (X = Y^n), n \geq 2} (Macintyre predicates).


The property of having quantifier-elimination in theories of non-alegebraically closed fields always involve some special property of the models of the theory in question (for example, the fact that real closed fields admit a unique ordering in the case of the reals).

Coming attraction

This post is a prelude to a longer post that I have been thinking about making on the connections between quantifier elimination, cohomology and complexity theory. A glimpse of this is available here:


Connectivity of joins, cohomological quantifier elimination, and an algebraic Toda’s theorem, Saugata Basu, Deepam Patel, Selecta Mathematica, volume 26, (2020).

References

All background material can be found in the two books shown below.

Standard disclaimer

It goes without saying that the views expressed are mine and may not reflect those of my co-authors/co-conspirators. The LaTeX to WordPress conversion was done using the latex2wp package written by Luca Trevisan.

Standard

Vapnik-Chervonenkis density: from uniform convergence to Berkovich analytification and stably dominated types

This post is about the notion of Vapnik-Chervonenkis dimension — a concept that arose more-or-less simultaneously in probability theory, in model theory (the part of mathematical logic dealing with abstract structures), and in combinatorics. It is fascinating how this concept which just has to do with set systems (with no additional structure) ends up having connections with so many different areas of mathematics — including very recent developments in model theory due to Hrushovski and Loeser.

The goal of this post is to give readers (with diverse backgrounds) a glimpse of these connections/applications. I have skimped on many details obviously — the idea being to show just enough to whet the appetite of the reader to learn more about the topic.

Since the post is somewhat long I have organized it into sections and the reader might jump directly to the section most interesting to her. The sections are independent of each other (modulo the initial definitions and notation provided in Section 1). They are enumerated as follows.

  1. In Section 1 I give the basic definitions and discuss the famous “Dichotomy” lemma.
  2. In Section 2 I discuss applications to probability theory.
  3. In Section 3 I discuss applications to discrete and computational geometry.
  4. In Section 4 I discuss connections with model theory (assuming no background in logic).
  5. In Section 5 I relate VC density with the more geometric notion of {0/1}-patterns.
  6. In Section 6 I discuss the “cohomological method” for bounding the VC codensity of (or equivalently the number of {0/1}-patterns realized by) a family of subsets of a given set.
  7. In Section 7 I review briefly some history of the quantitative problem of obtaining tight upper bounds on the VC codensity of definable families in different structures.
  8. Finally, in Section 8 I discuss recent work with Deepam Patel on obtaining tight upper bounds on the VC density of definable families over a non-archimedean valued field.

Standard disclaimer: It goes without saying that the views expressed are mine and may not reflect those of my co-authors/co-conspirators. The LaTeX to WordPress conversion was done using the latex2wp package written by Luca Trevisan. Thanks Luca!

1. Origin and Definition

The Vapnik-Chervonenkis (henceforth VC) dimension (as well as density) of a set {\mathcal{C}} of subsets of a set {X} is a combinatorial notion which is motivated independently by several different areas of mathematics. Its origin can be traced to probability theory (Vapnik and Chervonenkis), model theory (Shelah), and combinatorics (Sauer) — all around the same time.

For any subset {D} of {X}, we denote

\displaystyle S(D;\mathcal{C}) = \{D \cap C \mid C \in \mathcal{C}\}.

We say that a (finite) subset {D} of {X}, is shattered by {\mathcal{C}}, if

\displaystyle S(D;\mathcal{C}) = 2^D.

The VC dimension, {\mathrm{vcdim}(\mathcal{C})}, is the largest {n}, such that there exists a subset {D \subset X} with {\mathrm{card}(D)=n} which is shattered by {\mathcal{C}}. If no such {n} exists, we set {\mathrm{vcdim}(\mathcal{C}) = \infty}.

The definition of VC definition is best motivated by the following lemma due (independently) to Sauer and Shelah.

Sauer, N. (1972), “On the density of families of sets”, Journal of Combinatorial Theory, Series A, 13: 145-147, doi:10.1016/0097-3165(72)90019-2.

Shelah, Saharon (1972), “A combinatorial problem; stability and order for models and theories in infinitary languages”, Pacific Journal of Mathematics, 41: 247-261, doi:10.2140/pjm.1972.41.247.

In the statement of the lemma and later we will use the convenient notation

\displaystyle D \subset_n X

to denote

\displaystyle D \subset X \mbox{ and } \mathrm{card}(D) = n.

Lemma 1 (Dichotomy) Suppose that {\mathrm{vcdim}(\mathcal{C}) = d < \infty}. Then for all {D \subset_n X}, with {n > d},

\displaystyle \mathrm{card}(S(D,\mathcal{C})) \leq \sum_{0 \leq i \leq d} \binom{n}{i}.

In particular, there exists {c > 0}, such that for all {n>0},

\displaystyle \mathrm{card}(S(D,\mathcal{C})) \leq c \cdot n^d.

 

The name “Dichotomy” is justified by the following observation.

We denote

\displaystyle \pi_{\mathcal{C}} (n) = \sup_{D \subset_n X} \mathrm{card}(S(D;\mathcal{C})). \ \ \ \ \ (1)

Lemma 1 implies that exactly one of the following two alternatives is true.

  1. {\pi_{\mathcal{C}}(n) = 2^n \mbox{ for all } n};
  2. there exists {d > 0, c > 0} such that {\pi_{\mathcal{C}}(n) \leq c \cdot n^d}.

The VC density, denoted {\mathrm{vcdensity}(\mathcal{C})}, of {\mathcal{C}} is defined by

\displaystyle \mathrm{vcdensity}(\mathcal{C}) = \limsup_{n} \frac{ \log(\pi_\mathcal{C}(n))}{\log(n)}.

It is an immediate consequence of Lemma 1 that:

\displaystyle \mathrm{vcdensity}(\mathcal{C}) \leq \mathrm{vcdim}(\mathcal{C}).

The above inequality can be strict. However, it follows from Lemma 1 that that

\displaystyle \mathrm{vcdim}(\mathcal{C}) < \infty \Leftrightarrow \mathrm{vcdensity}(\mathcal{C}) < \infty.

Later on in this post we will focus mostly on VC density rather than on VC dimension, since VC density can be related to more geometric notions (such as {0/1}-patterns to be defined later).

2. Application in probability theory

The original motivation that led Vapnik and Chevonenkis to define VC dimension came from probability theory — namely, to provide necessary and sufficient condition for uniform law of large numbers to hold in certain situation. We describe this briefly below.

Let {X} be a sample space on which a probability measure is defined, and {\mathcal{C}} be a collection of measurable subsets of {A}. If {x_1,\ldots,x_n} are samples drawn independently from {X} with the given probability measure, then for every fixed {C \in \mathcal{C}}, the random variable

\displaystyle \left|\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right|

goes to {0} in probability as {n \rightarrow \infty}. This is just the law of large numbers.

However, if we want the convergence to zero to be uniform i.e. we want the random variable

\displaystyle \sup_{C \in \mathcal{C}} \left(\left|\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right|\right)

to go to {0} in probability as {n \rightarrow \infty}, then we need to impose further conditions (in the way we have stated this it is not clear that that the function on the left hand side is measurable, but let us assume that it is. It will be measurable if we should assume that {\mathcal{C}} is countable for instance).

Vapnik and Chervonenkis proved in their paper:

Vapnik, V. N; Chervonenkis, A. Ya. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities, Theory of Probability and its Applications; Philadelphia Vol. 16, Iss. 2, (1971): 17. DOI:10.1137/1116025

that a suffcient condition for the uniform convergence is

\displaystyle \mathrm{vcdim}(\mathcal{C}) < \infty.

More precisely they proved (Theorem 2, loc. cit.) that for every {\varepsilon > 0}\displaystyle \mathbb{P}\left( \sup_{C \in \mathcal{C}} (\left\vert\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right\vert\right) > \varepsilon) < 4 \cdot \pi_{\mathcal{C}}(2 n) \cdot \exp(-\varepsilon^2 n/8).

In particular, if {\mathrm{vcdim}(\mathcal{C})} is finite, and hence {\pi_{\mathcal{C}}(n)} (cf. (1)) is polynomially bounded in {n} by Lemma 1, then we have uniform convergence to {0} in probability.

3. Application in discrete and computational geometry

VC dimension has played an extremely important role in discrete and computational geometry — in particular, in the algorithmic problem often referred to as “range searching”.

Reusing terminology from before let {X} be a set and {\mathcal{C} \subset 2^X}. In the context of computational geometry, {X} is often a finite dimensional real vector space, and elements of {\mathcal{C}} are often a class of semi-algebraic sets having “constant description complexity” (for example, {X = \mathbb{R}^2}, and {\mathcal{C}} the set of all unit disks, or half spaces, or triangles etc.). Elements of {\mathcal{C}} are referred to as “range” sets. The range searching problem of takes as input given a finite {S} set of {n} points in {X}. The goal is to “preprocess” them, so that given a query “range” {C \in \mathcal{C}}, one can report quickly properties of the set {S\cap C}: for example, the cardinality of {S \cap C}, or whether {S \cap C} is empty or not. One can find details in the survey article:

Agarwal, Pankaj K. “Simplex range searching and its variants: a review” A journey through discrete mathematics, 1–30, Springer, Cham, 2017.

A key tool in efficient solutions proposed for the range searching problem is the notion of of {\varepsilon}-nets (and the related notion of {\varepsilon}-approximations). The idea is to represent the given set {S} with a very small sized “representative” subset in {X} which is guaranteed to have good properties with respect to the range sets in {\mathcal{C}}.

Definition 2 Given a pair {(X,\mathcal{C})}, and {\varepsilon, 0 \leq \varepsilon \leq 1}, and {S} a finite subset of {X}. A subset {Y \subset X} is called an {\varepsilon}-net (resp. {\varepsilon}-approximation) for {S} if {Y \cap C \neq \emptyset} for all {C \in \mathcal{C}} with {\mathrm{card}(S \cap C) \geq \varepsilon \mathrm{card}(S)} (resp. if {\left| \frac{\mathrm{card}(Y \cap C)}{\mathrm{card}(Y)} - \frac{\mathrm{card}(S \cap C)}{\mathrm{card}(S)} \right| < \varepsilon} for all {C \in \mathcal{C}}).

The important result is (which we state in terms of finite sets or equivalently for atomic measures than for more general measure spaces) is the following. The VC dimension plays of the range {\mathcal{C}} of course plays a critical role in it.

Theorem 3 If {\mathrm{vcdim}(\mathcal{C}) = d < \infty}, then there exists {c > 0}, such that for every {\varepsilon \in (0,1)}, and finite subset {S} of {X}, there exists an {\varepsilon}-net (resp. an {\varepsilon}-approximation) for {S} of cardinality at most {c \cdot \frac{d}{\varepsilon}\log \frac{1}{\varepsilon}} (resp. {c \cdot \frac{d}{\varepsilon^2}\log \frac{1}{\varepsilon}}).

Haussler, David; Welzl, Emo {\varepsilon}-nets and simplex range queries. Discrete Comput. Geom. 2 (1987), no. 2, 127-151.

Komlos, Janos; Pach, Janos; Woeginger, Gerhard Almost tight bounds for {\varepsilon}-nets. Discrete Comput. Geom. 7 (1992), no. 2, 163-173.

The key point of course is the fact that the cardinality of the {\varepsilon}-net (resp. {\varepsilon}-approximations) can be bounded independent of the cardinality of {S}. Thus, with respect to the “range space” {(X,\mathcal{C})}, any finite set {S} can be replaced by an “approximation” (i.e. an {\varepsilon}-net or an {\varepsilon}-approximation) of size that depends on {\varepsilon} and the pair {(X,\mathcal{C})} but independent of the cardinality of {S}. This is what makes the above theorem key to many results in computational geometry. Theorem 3 also has an effective (algorithmic) variant. In turns out that there is a highly efficient randomized algorithm that can compute an {\varepsilon}-net with high probability — and together these lead to efficient solutions to the range searching problem.

4. Application in model theory

In model theory the notion of VC-density is connected with the so called “independence property” of formulas and models, which in turn is related to the property of of stability which is of fundamental importance in model theory.

Mathematical logic and model theory has its own set of notation and definitions — some of which may not be completely familiar to non-logicians. In the following paragraphs I condense in half-a-page what usually takes a couple of lectures and several pages worth of writing to write “properly” — hopefully with no loss of content. I assume familiarity with the definition of a “first-order formula” in some language but not much more.

Given a language {L} (i.e. a tuple of symbols representing a set, constants, functions, and relations) an {L}-structure {\mathcal{M}} is a tuple consisting of a set, {M} along with a set of constants, functions and relations on {M} interpreting the corresponding symbols of {L}. So if {L = (0,1,+,\cdot)} (also referred to as the language of rings with a unit element), then the tuple {(\mathbb{Z}, 0,1,+,\cdot)} with the usual interpretations of the symbols “{0}”, “{1}”, “{+}”, “{\cdot}” is an {L}-structure.

If {L = (0,1,+,\cdot)}, then is

\displaystyle \forall y \exists z (y\cdot y + z\cdot z = x)

is an example of an {L}-formula having {\{x\}} as its set of free variables. An {L}-formula with an empty set of free variables is an {L}-sentence. (The reader can now come up with the proper definitions of {L}-formulas and sentences.)

A theory {T} is a set of {L}-sentences. An {L}-structure {\mathcal{M}} is a model of {T} (denoted {\mathcal{M} \models T}) if every sentence in {T} is “true” in the structure {\mathcal{M}} (the reader can supply what it means to be be “true” in {\mathcal{M}}). A theory is called consistent if it admits a model. A theory {T} is complete if every for every {L}-sentence {\phi}, either {\mathcal{M} \models \phi} for every model {\mathcal{M}} of {T}, or {\mathcal{M} \models \neg \phi} for every model {\mathcal{M}} of {T}.

Quantifier elimination

Notice that a formula can have quantifiers. If every formula is equivalent {\mod T} to a formula without quantifiers then we say that the theory {T} admits quantifier elimination. (Two formulas {\phi(x),\psi(x)} are equivalent {\mod T} iff {T \vdash \phi \leftrightarrow \psi} or equivalently iff {\mathcal{M} \models \forall x \phi(x) \leftrightarrow \psi(x)} for every model {\mathcal{M}} of {T}.) It is a classical result in algebraic geometry (often referred to in non-model theory context as the Lefschetz principle) that the theory of algebraically closed fields admits quantifier elimination. The theory of real closed ordered fields admits quantifier elimination (in the language of {(0,1,+,\cdot, <)} of ordered fields) — this was proved by Tarski. Th property of quantifier elimination depends on the language. So the theory of real closed fields (without the order relation) does not admit quantifier elimination. Why ?

Types

One notion that we will not discuss much, but which plays a key role in the application that we describe in Section 8, is that of types. Given an {L}-structure {\mathcal{M} = (M,\cdots)}, a subset {A \subset M}, and {n>0}, an {n}-type {p} over {A} is a consistent set of {n}-ary {L}-formulas with parameters in {A} (this is the same as saying that there exists an elementary extension {\mathcal{N}} of {\mathcal{M}} and {a \in \mathcal{N}} such that {\mathcal{N} \models \phi(a)} for every {\phi \in p}.) An {n}-type over {A} is called complete if for every {n}-ary formula {\phi} with parameters in {A}, either {\phi \in p} or {\neg \phi \in p}. The set of {n}-types over {A} is denoted {S_n(A)}. The type space {S_n(A)} carry a natural topology referred to as the Stone topology.

Types in model theory are a device for producing a topological space out of (a model of) a theory {T}, analogous to how algebraic geometers produce the topological space {\mathrm{Spec}(A)} (equipped with the Zariski topology) out of a commutative ring {A}. In fact it is more than an abstract analogy. If {K} is a model of the theory of algebraically closed fields, then the space of complete {n}-types, {S_n(K)}, is indeed in bijection with {\mathrm{Spec}(K[X_1,\ldots,X_n])} (it is a nice do-able exercise to come up with this bijection just from what has been said above). Moreover, the natural bijective map {S_n(K) \rightarrow \mathrm{Spec}(K[X_1,\ldots,X_n])} is continuous (with the Stone topology on the left and the Zariski topology on the right). Its inverse though is not continuous (the Stone topology is much finer than the Zariski topology).

Let us now fix a language {L}, and a {L}-theory {T}, and let {\mathcal{M}= (M,\cdots) } be a model of {T}. A subset {X} of {M^n} is said to be definable if there exists a {L}-formula {\phi(x)} with {|x| = n}, such that {X = \{a \in M^n \mid \mathcal{M}\models \phi(a)\}} (in model theory lingo it is usual to write in this case {X = \phi(\mathcal{M})}). We also say {X} is definable with parameters if there exists a formula {\phi(x,y)}, such that there exists {b \in M^{|b|}} such that {X = \phi(\mathcal{M},b)}. The set of subsets {\{\phi(\mathcal{M},b) \mid b \in M^{|y|} \}} is a definable family of subsets of {M^{|x|}}.

(One final piece of warning about notational abuse. Model theory textbooks often drop the exponents in formulas like {a \in M^{|a|}} and just write {a \in \mathcal{M}} even when {|a| > 1}. This is harmless and does not cause any ambiguity.)

O-minimal structures

One class of structures that is playing an increasing important role in applications of model theory to other areas of mathematics (Diophantine geometry, Hodge theory amongst others), and which also plays an important role in the “VC story” to which this post is devoted, is the class of o-minimal structures.

Let {L=(<,\cdots) } be a language having a binary relation symbol {<}. Then, an {L}-structure {\mathcal{M} = (M,<,\cdots)} is said to be o-minimal if the reduct {(M,<)} is a model of the theory of dense linear order, and the set of definable subsets (with parameters) of {M^1=M} is the smallest possible — namely, finite unions of points and intervals (such sets are clearly definable with parameters in the ordered structure {(M,<)}). This explains the name “order-minimal” or o-minimal. The most interesting examples of o-minimal structures are those that expand the o-minimal structure {(\mathrm{R},<,+,\cdot)} where {\mathrm{R}} is a real closed field. Gabrielov proved that there exist strict expansions of {(\mathbb{R},<,+,\cdot)} (as his undergraduate thesis work in Moscow State University long before the name “o-minimal” was invented!), and many other structures have been shown to be o-minimal since then. The terminology of “o-minimality” was introduced by Pillay and Steinhorn.

Gabrielov, A. M. Projections of semianalytic sets. (Russian) Funkcional. Anal. i Priložen. 2 1968 no. 4, 18-30.

Pillay, Anand; Steinhorn, Charles Definable sets in ordered structures. Bull. Amer. Math. Soc. (N.S.) 11 (1984), no. 1, 159-162.

The definable sets in an o-minimal structure share many of the tameness properties of real semi-algebraic sets (uniform finiteness in families of topological invariants such as the number of connected components or the total Betti number, local conical structure etc.). As such they provide a natural setting for studying geometric problems without having to worry about wild oscillatory functions, for example {\sin(1/x)} over {x > 0}.

A standard source book for o-minimal structures is:

van den Dries, Lou. Tame topology and o-minimal structures. London Mathematical Society Lecture Note Series, 248. Cambridge University Press, Cambridge, 1998. x+180 pp. ISBN: 0-521-59838-9

Stability and independence

We now discuss the (related) notions of stability and independence of theories. We start with the former since stability is the more important of the two notions (but we will soon see that it is independence that has to do with VC theory).

Let {\phi(x,y)} be a {L}-formula (here {x,y} are tuples of free variables). The formula {\phi} is said to be stable if there does NOT exist sequences {(a_i)_{i \in \omega}, (b_i)_{i \in \omega}} in some model of {\mathcal{M}} of {T}, satisfying {\mathcal{M} \models \phi(a_i,b_j) } if and only if {i < j}. The theory {T} is said to be stable if every {\mathcal{L}}-formula {\phi(x,y)} is stable. In other words, {T} is stable if it is not possible to order infinite sequences using a formula (modulo {T}) in the above sense. The theory of real closed fields is not stable — take {\phi(x,y) := \exists z \neg(z =0 ) \wedge (y = x + z^2)} (which expresses {x < y}), and let {a_i = b_i = i \in \mathbb{R}}. On the other hand the theory of algebraically closed fields is stable. The stability property of theories has several equivalent definitions. For example, stability can be defined in terms of cardinalities of type spaces (“stable theories have few types”). These different definitions seem to have nothing to do with each other on the surface (the hallmark of most important notions in mathematics).

We next define the notion of independence. We say that a formula {\phi(x,y)} has the independence property if there exists a sequence {(a_i)_{i \in \omega}} in some model of {\mathcal{M}} of {T}, such that for all {\tau \subset \omega}, the set of formulas {\{\phi(x,a_i) \mid i \in \tau\} \cup \{\neg \phi(x,a_i) \mid i \not\in \tau\}} is consistent. A theory is said to “not have the independence property” (NIP for short) if no {\mathcal{L}}-formula {\phi(x,y)} has the independence property. If a theory has the independence property then the theory cannot be stable (Exercise !). However, the converse is not true — NIP theories need not be stable. For example, the theory of real closed fields is NIP but not stable. The following theorem characterizes the gap between being NIP and stable. We say that a formula {\phi(x)} has the strict order property if there exists a sequence {(a_i)_{i \in \omega}} in some model {\mathcal{M}} of {T}, such that {\mathcal{M} \models \phi(x,a_i) \rightarrow \phi(x,a_j)} if and only if {i < j}.

Theorem 4 (Shelah) A theory is not stable if and only if there exists a formula {\phi(x,y)} having the independence property or the strict order property.

A good source for NIP theories is the book:

Simon, Pierre A guide to NIP theories. Lecture Notes in Logic, 44. Association for Symbolic Logic, Chicago, IL; Cambridge Scientific Publishers, Cambridge, 2015. vii+156 pp. ISBN: 978-1-107-05775-3

NIP and finiteness of VC density

Finally, coming back to VC-density — it is an easy exercise to see that if {\mathcal{M}} is a model of {T}, and {\phi(x,y)} a {L}-formula — then {\phi(x,y)} not having the independence property is equivalent to the statement that the definable family {\{\phi(a,\mathcal{M}) \mid a \in \mathcal{M}\}} of subsets of {M^{|b|}} has finite VC-density. In particular, any definable family of sets in an NIP structure has finite VC dimension/density.

5. VC-density and {0/1}-patterns

For any set {V} and a finite set of subsets {\mathcal{C} = \{V_1,\ldots,V_n\}} of {V}, the set of {0/1} -patterns realized by {\mathcal{C}} is the subset {\Sigma(V,\mathcal{C})} of {\{0,1\}^{[1,n]}} defined by

\displaystyle \Sigma(V,\mathcal{C}) = \{\sigma \in \{0,1\}^{[1,n]} \mid \exists v \in V \mbox{ such that } \sigma(i) = \chi_{V_i}(v) \},

where {\chi_{V_i}(\cdot)} is the ordinary characteristic function of the set {V_i}.

Let {V,W} be sets and {X \subset V \times W}, and let {\pi_V:X \rightarrow V, \pi_W:X \rightarrow W} be the two projections. We will abuse notation slightly and denote for {v \in V} (resp. {w \in W}) {X_v = \pi_V^{-1}(v)} (resp. {X_w = \pi_W^{-1}(w)}) by {X_v} (resp. {X_w}). Let {\mathcal{C} = \{ \pi_V(X_w) \mid w \in W\} \subset 2^V} and {\mathcal{D} = \{\pi_W(X_v) \mid v \in V\} \subset 2^W}.

It is an easy exercise now to check that for {w_1,\ldots,w_n \in W},

\displaystyle \mathrm{card}(\Sigma(V,\{X_{w_1},\ldots,X_{w_n}\})) = \mathrm{card}(S(\{w_1,\ldots,w_n\},\mathcal{D})),

Denoting

\displaystyle F(n) = \max_{(w_1,\ldots,w_n) \in W^n} \mathrm{card}(\Sigma(V,\{X_{w_1},\ldots,X_{w_n}\})),

we have

\displaystyle \mathrm{vcdensity}(\mathcal{D}) = \limsup_{n} \frac{\log F(n)}{n}.

In particular, upper bounds on the cardinalities of the set of {0/1}-patterns realized by finite subsets of {\mathcal{C}}, gives upper bound on the VC density of the family {\mathcal{D}}. The VC density of the definable family {\mathcal{D}} is then called the VC codensity of the definable family {\mathcal{C}}.

Example 1 Let {V = W = \mathbb{R}^2}, and {X = \{((x,y),(a,b)) \in V \times W \mid (x-a)^2 + (y-b)^2 \leq 1\}}. Then, {\mathcal{C}} is the set of all closed unit disks in the {V}, and {\mathcal{D}} the set of all closed unit disks in {W}.

Proving tight upper bounds on the VC codensities of (or equivalently {0/1}-patterns realized by) definable families in various structures has attracted a lot of attention of both combinatorialists and model theorists. In many combinatorial applications, upper bounds on the number of {0/1}-patterns or sign patterns if the underlying field is real give upper bounds on the number of interesting objects — classical applications of such methods include proving upper bounds on the number of realizable oriented matroids, combinatorial types of polytopes etc.

The very first application of this type that I am aware of is:

Goodman, Jacob E.; Pollack, Richard. There are asymptotically far fewer polytopes than we thought. Bull. Amer. Math. Soc. (N.S.) 14 (1986), no. 1, 127-129.

More recently, results in incidence combinatorics (such as the Szemeredi-Trotter theorem and its various generalizations) which have been traditionally studied over the real and complex fields, have being generalized to more general model-theoretic structures. Upper bounds on VC density/codensity often play a key role in such questions.

For the rest of this post we will now switch to the more geometric language of {0/1}-patterns and explain how techniques from topology and algebraic geometry help to prove tight upper bounds on the number of {0/1}-patterns.

6. Bounding the number of {0/1}-patterns: what has cohomology to do with it ?

It is instructive to consider the following continuation of Example 1.

To illustrate the main idea consider the problem of bounding the number of {0/1}-patterns realized by {n} closed unit disks in {\mathbb{R}^2} (cf. Example 1), assuming that the boundaries of the disks intersect transversally.

It is not difficult to see that the number of {0/1}-patterns in this case is bounded by the number of connected components of the complement of the union of the boundaries of these disks.

So it suffices to bound this number. We make another simplification for exposition. It makes no difference topologically since all the boundaries are compact to assume that the arrangement of {n} circles is on the sphere {\mathbb{S}^2} rather than on {\mathbb{R}^2} (by taking a one-point compactification of the plane if you wish).

Let the {n} unit circles be denoted {S_1,\ldots,S_n}. Using Alexander duality,

\displaystyle b_0(\mathbb{S}^2 - \bigcup_{i} S_i) = b_1(\bigcup_i S_i) +1,

where I am denoting by {b_i(\cdot)} the dimension of the {i}-th cohomology group with rational coefficients. To bound {b_i(\bigcup_i S_i)} we make use of a spectral sequence argument. We consider the Mayer-Vietoris spectral sequence (more precisely the spectral sequence of the row-wise filtration of the Čech complex) of the covering {\{S_i\}_{i \in [1,n]}} of the union {\bigcup_{i \in [1,n]} S_i}:

\displaystyle E_r^{p,q} \Rightarrow \mathrm{H}^{p+q}(\bigcup_{i \in [1,n]} S_i).

The {E_1}-page of this spectral sequence is given by

\displaystyle E_1^{p,q} = \bigoplus_{I \subset_{p+1} [1,n]} \mathrm{H}^q(S_I),

where for {I \subset [1,n]},

\displaystyle S_I = \bigcap_{i \in I} S_i.

Observe that since in any spectral sequence,

\displaystyle \dim E_{r}^{p,q} \geq \dim E_{r'}^{p,q}, r \leq r',

we have for every {k \geq 0}

\displaystyle \dim \mathrm{H}^k(\bigcup_{i \in [1,n]} S_i) = \sum_{p+q=k} \dim E_{\infty}^{p,q} \leq \sum_{p+q=k} \dim E_{1}^{p,q}.

In partiular, taking {k=1},

\displaystyle b_1(\bigcup_i S_i) = \dim E_\infty^{0,1} + \dim E_\infty^{1,0} \leq \dim E_1^{0,1} + \dim E_1^{1,0},

it suffices to bound the dimensions of the groups {E_1^{0,1}} and {E_1^{1,0}} separately. Now,

\displaystyle  E_1^{0,1} = \bigoplus_{i \in [1,n]} \mathrm{H}^1(S_i).

In our case \dim \mathrm{H}^1(S_i) = 1 for each i , and so

\displaystyle \dim E_1^{0,1} = \binom{n}{1} \cdot 1 = n. ,

Also,

\displaystyle  E_1^{1,0} = \bigoplus_{1 \leq i < j \leq n} \mathrm{H}^0(S_i \cap S_j) , and in our case \dim \mathrm{H}^0(S_i \cap S_j) = 0 \mbox{ or } 2 for each pair i < j,   and so

\displaystyle \dim E_1^{1,0} \leq \binom{n}{2} \cdot 2.

Putting everything together we obtain

\displaystyle b_1(\bigcup_i S_i) \leq n + \binom{n}{2}\cdot 2 = n^2.

This implies that the number of {0/1}-patterns realized by {n} closed unit disks (whose boundaries intersect transversally) is bounded by {n^2+1}.

The main things to observe about the above argument are the following.

  1. The bound on the {E_1}-terms of the spectral sequence is a product of two quantities, namely —
  1. A binomial coeficient counting the number of summands in the direct sum of case. (This number grows with {n} and has a polynomial dependence. The maximum degree of this polynomial amongst the various terms ({E_1^{1,0}} in our case) provides an upper bound on the VC codensity (of the family {\mathcal{C}}).)
  2. An upper bound on a certain Betti number of a set obtained as the intersection of a bounded number (the bound does not grow with {n}) of sets obtained in some way from the original sets (taking boundaries in this instance). (There is an absolute upper bound on this part which is independent of {n}.)
  • Secondly, what we actually ended up bounding is the sum of zero-th Betti number (i.e. the number of connected components) of the realizations of all the {0/1}-patterns realized by the set {\mathcal{D} = \{D_1,\ldots,D_n\}}. If we denote for {\sigma \in \{0,1\}^{[1,n]}}

    \displaystyle \mathcal{R}(\sigma) = \{x \in \mathbb{R}^2 \mid \sigma(i) = \chi_{D_i}(x), 1 \leq i \leq n\},

    we have proved:

    \displaystyle \sum_{\sigma \in \{0,1\}^{[1,n]}} b_0(\mathcal{R}(\sigma)) \leq n^2+1. \ \ \ \ \ (2)

     

    On other hand, since {\mathcal{R}(\sigma) \neq \emptyset \Rightarrow b_0(\mathcal{R}(\sigma) > 0}, we have the inequality,

    \displaystyle \mathrm{card}(\Sigma(\mathbb{R}^2,\mathcal{D})) \leq \sum_{\sigma \in \{0,1\}^{[1,n]}} b_0(\mathcal{R}(\sigma)). \ \ \ \ \ (3)

     

    Together, (2) and (3) imply that

    \displaystyle \mathrm{card}(\Sigma(\mathbb{R}^2,\mathcal{D})) \leq n^2 +1.

     

  • Finally, a key role (which is hidden by our use of Alexander duality) is played by the fact that the cohomology groups of semi-algebraic subsets of {\mathbb{R}^2} vanishes in dimensions {2} and higher.The “spectral sequence argument” sketched above with certain extensions give a powerful method for obtaining tight bounds on VC codensities. In our example, we made some simplifications to make the exposition simpler. For example, we assumed that the boundaries of the disks intersected transversally. If this was not the case, and the circles were allowed to touch each other, then the argument would fail (the realizations of the {0/1}-patterns will not be bounded from above by the number of connected components of the complement of the union of circles. In this case one needs to replace the union of the boundaries by a more complicated system of sets (involving infinitesimal tubes of various widths around the sets of the given family). Furthermore, if the given family was not in {\mathbb{R}^2}, but in some other semi-algebraic set not necessarily non-singular, then we would not be able to use Alexander duality. So there are many technical issues that I skipped but which needs to be dealt with in order to make the argument general. These can be found in: 

    Basu, Saugata; Pollack, Richard; Roy, Marie-Francoise On the Betti numbers of sign conditions. Proc. Amer. Math. Soc. 133 (2005), no. 4, 965-974.

    7. Some History

    As mentioned earlier, the problem of obtaining tight bounds on the VC-codensity (or equivalently, a bound on the number of {0/1}-patterns) for definable families in an NIP structure has attracted a lot of attention over the years.

    For definable families of algebraic hypersurfaces in {\mathbb{F}^k} of fixed degree over a field {\mathbb{F}}, Babai, Ronyai, and Ganapathy in

    Ronyai, Lajos; Babai, Laszlo; Ganapathy, Murali K. On the number of zero-patterns of a sequence of polynomials. J. Amer. Math. Soc. 14 (2001), no. 3, 717-735.

    used a “linear algebra argument” to prove that VC-codensity of such families is bounded by {k}. This bound is easily seen to be optimal.

    For families of hypersurfaces (defined by polynomials of bounded degrees) on any real variety of {V}, it is shown in

    Basu, Saugata; Pollack, Richard; Roy, Marie-Francoise. On the Betti numbers of sign conditions. Proc. Amer. Math. Soc. 133 (2005), no. 4, 965-974.

    using the topological method outlined in Section 6 that the VC-codensity is bounded by {\dim_{\mathbb{R}} V}.

    Finally, it was shown in

    Basu, Saugata. Combinatorial complexity in o-minimal geometry. Proc. Lond. Math. Soc. (3) 100 (2010), no. 2, 405-428.

    that the VC codensity of any definable family of subsets of a definable set {V} in an arbitary o-minimal expansion of a real closed field is bounded by {\dim V}.

    (Note that some of the papers listed above prove more refined and general upper bounds on the Betti numbers of the realizations of {0/1}– patterns or sign patterns — where there is also explicit dependence on the family (such as on the degrees of the polynomials defining the hypersurfaces). The bound on the VC codensity is extracted from the exponent of “{n}” in the bound on the zero-th Betti number as in Section 6.)

    8. VC codensity bounds over valued fields

    A class of structures that is important in model theory (and indeed other areas of mathematics, such as number theory) are valued fields. We are mostly interested in fields {K}, with a non-archimedean valuation {|\cdot|: K - \{0\} \rightarrow \Gamma}, where {\Gamma} is an ordered group. Tha valuation {|\cdot|} is assumed to be multiplicative and satisfy the ultrametric inequality — namely, {|x + y| \leq \max(|x|,|y|)}.

    (Note that we are following model theory convention and writing the valuation multiplicatively instead of additively which is more standard).

    The most familiar example of a valued field (with a non-archimedean valuation) is the field {\mathbb{Q}} with the {p}-adic valuation, {|\cdot|_p} for some prime number {p}. Writing a non-zero rational number {\frac{a}{b}}

    \displaystyle \frac{a}{b} = p^\nu \cdot \frac{c}{d}, \mbox{ where } (p,c) = (p,d) =1,

    the {p}-adic valuation is given by

    \displaystyle \left|\frac{a}{b} \right|_p = p^{-\nu}.

    Given a valued field {(K,|\cdot|)}, its valuation ring {A} is defined as {A = \{ x \in K | |x| \geq 1\}}. The valuation ring {A} has a unique maxmal ideal {\mathfrak{m} = \{x \in A \mid |x| > 1\}}, and the quotient field {k= A/\mathfrak{m}} is called the residue field of {(K,|\cdot|)}. The residue field of {\mathbb{Q}_p} is isomorphic to {\mathbb{Z}/p \mathbb{Z})}. The pair {(\mathrm{char}(K),\mathrm{char}(k))} is called the characteristic pair of {(K, |\cdot|)}. The characteristic pair of a valued field can be {(0,0), (0,p)}, or {(p,p)} where {p} is a prime. The characteristic pair of {(\mathbb{Q},|\cdot|_p)} is clearly {(0,p)}. The fields of Puiseux series {\bigcup_{n > 0} F(T^{1/n})} with coefficients in a field {F}, furnishes examples of valued fields with characteristic pairs {(0,0)} or {(p,p)} depending on the characteristic of {F}.

    From the model theory perspective, the structure of a valued field is an example of a two-sorted structure (as opposed to the one-sorted structures defined in Section 4). This means that there are two sorts of variables — of the field sort and the value group sort. The language of valued fields is correspondingly two sorted and is the tuple {(0_K,1_K,+_K,\cdot_K, \cdot_\Gamma,<_\Gamma)} where the subscripts {K,\Gamma} refer to the two sorts — field and value group respectively. It is a non-trivial theorem that the two-sorted theory of algebraically closed valued fields admit quantifier elimination (in the corresponding two-sorted language). Thus if {(K,\Gamma)} is a model of the theory algebraically closed valued fields, then every definable subset of {K^m} can be described by a quantifier-free formula, whose atoms are of the form {|F(x)| \leq \lambda \cdot |G(x)|}, where {F,G \in K[x]} and {\lambda \in \Gamma}.

    A good source for the model theory of algebraically closed valued fields (and also for what comes next) is:

    Haskell, Deirdre; Hrushovski, Ehud; Macpherson, Dugald. Stable domination and independence in algebraically closed valued fields. Lecture Notes in Logic, 30. Association for Symbolic Logic, Chicago, IL; Cambridge University Press, Cambridge, 2008. xii+182 pp. ISBN: 978-0-521-88981-0

    If {K} is an algebraically closed valued field (for example {K = \overline{\mathbb{Q}_p}, \mathbb{C}((T)), \overline{\mathbb{F}}_p((T))}), then the problem of obtaining tight bounds on the VC-codensity for definable families in {K^p} was considered in:

    Aschenbrenner, Matthias; Dolich, Alf; Haskell, Deirdre; Macpherson, Dugald; Starchenko, Sergei Vapnik-Chervonenkis density in some theories without the independence property, I. Trans. Amer. Math. Soc. 368 (2016), no. 8, 5889-5949.

    In particular, they prove a bound of {2 p} for the VC-codensity in the special case when the characteristic pair of {K} is {(0,0)}.

    If one contrasts this result with those in Section 7 above, one sees that the corresponding bound for the theories of algebraically closed and real closed fields, and also for any o-minimal theory, is {p} rather than {2p}. In fact, the bound in those cases equals the “cohomological dimension” (appropriately defined) of the ambient definable set for the definable family being considered.

    The method used in the paper of Aschenbrenner et. al. is algebraic and model theoretic. It is natural to consider a more topological approach. The first obstacle is that definable subsets of a valued field might not carry a nice enough topology on which the cohomological method described in Section 6 can be applied.  For various reasons (non-vanishing in dimensions bigger than the dimension of the ambient variety being one of them) ordinary sheaf cohomology with respect to the Zariski or etale site for schemes are not suitable.

    In order to get around this difficulty we first need to take a detour.

    Given an affine variety {V} defined over a complete valued field, we will embed {V} and all definable subsets of {V}, in a larger topological space having better topological properties than {V} itself. Embedding into a larger space can only increase the number of realizable {0/1}-patterns — and thus an upper bound on the number of {0/1}-patterns realized in the larger space is still an upper bound on the corresponding number of the original family. The technical tool to achieve this is also known as Berkovich analytification introduced by Berkovich.

    Berkovich analytification

    Suppose that {K} is a complete valued field with a non-trivial non-archimedean valuation.

    Let {R} be an integral domain. A multiplicative, non-archimedean seminorm on {R} is a map {|\cdot|: R \rightarrow \mathbb{R}_{\geq 0}} satisfying, {|0| = 0, |1| =1}, and {|a b| = |a||b|}, and the ultrametric inequality, {|a + b| \leq \max(|a|,|b|)}.

    Now, let {V = \mathrm{Spec}(R)} be an affine {K}-variety. Berkovich associates to {V} a space {V^{\mathrm{an}}} in the following way. The points of {V^{\mathrm{an}}} are multiplicative semi-norms {|\cdot|: R \rightarrow \mathbb{R}_{\geq 0}}, whose restriction to {K} equals the given valuation on {K} (semi-norms on fields are valuations as defined previously). For each {x \in V^{\mathrm{an}}}, let {|\cdot|_x} denote the corresponding semi-norm. The topology on {V^{\mathrm{an}}} is the weakest topology that makes all the evaluation maps,

    \displaystyle V^{\mathrm{an}} \rightarrow \mathbb{R}_{\geq 0}, x \mapsto |\phi|_x, \phi \in R

    continuous. Moreover, (this is key for our application), for each closed point {x \in V}, the function

    \displaystyle |\cdot|_x: R \rightarrow\mathbb{R}_{\geq 0}, \phi \mapsto |\phi(x)|

    (noticing that {\phi(x) \in K}) is a semi-norm, and this gives an embedding of the closed points of {V} into {V^{\mathrm{an}}}.

    The Berkovich analytic space {V^{\mathrm{an}}} has nice topological properties (it is locally compact for example), but it is not so easy to visualize. The Berkovich analytification of the projective line {\mathbb{P}^1_K} for example is homeomorphic to an infinite rooted tree. Moreover, for application to bounding VC codensity one has to deal with not just varieties, but definable subsets of these varieties as well.

    Berkovich, Vladimir G. Spectral theory and analytic geometry over non-Archimedean fields. Mathematical Surveys and Monographs, 33. American Mathematical Society, Providence, RI, 1990. x+169 pp. ISBN: 0-8218-1534-2

    Hrushovski-Loeser

    Fortunately, the recent break-through results of Hrushovski and Loeser gives us to get around the problems mentioned above. Hrushovski and Loeser give an alternative definition of {V^{\mathrm{an}}} as spaces of certain kind types. (As has been mentioned before, the model theorists’ method of producing spaces from rings is to define them as spaces of types — cf. discussion relating the {\mathrm{Spec}} of a polynomial ring in {n} variables with the space of complete {n}-types in Section 4 above). Moreover, associated to any quantifier-free formula with atoms of the form {|F| \leq \lambda \cdot |G|}, one has a corresponding semi-algebraic subset of {V^{\mathrm{an}}} (thought of now as a space of types). Astoundingly, Hrushovksi and Loeser prove certain key topological tameness properties of these “semi-algebraic sets” which are entirely analogous to those known in the case of o-minimal structures. Moreover, they prove that these “semi-algebraic” subsets of {V^{\mathrm{an}}} are homotopy equivalent to finite simplicial complexes of dimension at most {\dim(V)}. Therefore, their cohomological dimension is at most {\dim(V)}.

    Hrushovski, Ehud; Loeser, Francois. Non-archimedean tame topology and stably dominated types. Annals of Mathematics Studies, 192. Princeton University Press, Princeton, NJ, 2016. vii+216 pp. ISBN: 978-0-691-16169-3; 978-0-691-16168-6

    Using the results of Hrushovski and Loeser, in particular the vanishing of cohomology in dimensions greater than {\dim V}, Deepam Patel and I, were able to “import” the o-minimal techniques for proving upper bounds on Betti numbers of realizations of {0/1}-patterns to the valued field case. Available here.

    We prove the following theorem (stated a bit informally to avoid introducing more notation).

    Theorem 5 Let {K} be a algebraically closed complete non-archimedean valued field, {V} be an affine {K}-variety and {\mathcal{D}} a fixed definable family of semi-algebraic subsets of {V^{\mathrm{an}}}. Then, there exists {c > 0}, such that for each {i, 0 \leq i \leq \dim V}, and for every finite subset {\{D_1,\ldots,D_n\}} of elements of the family {\mathcal{D}}

    \displaystyle \sum_{\sigma \in \{0,1\}^{[1,n]}} b_i(\mathcal{R}(\sigma)) \leq c \cdot n^{\dim V -i},

    where

    \displaystyle \mathcal{R}(\sigma) = \{x \in V^{\mathrm{an}} \mid \sigma(i) = \chi_{D_i}(x), 1 \leq i \leq n\}.

     

    As an corollary (taking as usual {i=0}) in the above bound we obtain (using the same notation as in Theorem 5(:

    Corollary 6 The VC codensity of any definable family of definable subsets of {V} is bounded by {\dim V}.

    Corollary 6 removes the multiplicative factor of {2} from the upper bounds proved by Aschenbrenner et. al. mentioned before. Also, there is no restriction on the characteristic pair of {K} (thanks to the generality of the results by Hrushovski and Loeser).

    Archimedean vs non-archimedean

    It is a natural question to ask about what happens if we consider subsets defined by inequalities {|F(x)| \leq \lambda \cdot |G(x)|} when the {|\cdot|} is the archimedean norm — for example, when {K=\mathbb{C}}, and {|\cdot|} denotes the modulus. The answer is that for a definable family in {\mathbb{C}^p} defined by a formula having atoms of the form {|F| \leq \lambda \cdot |G|}, the VC codensity can be as large as {2 p}. An easy example is the following. Consider the definable family, {\mathcal{C}}, of unit disks in {\mathbb{C}} defined by {|z - w| \leq 1}, as {w} varies over {\mathbb{C}}. Given {n} unit disks in the plane, it is easy to check that the number of {0/1}-patterns can grow quadratically — so the VC codensity of {\mathcal{C}} is {2}. What happens to the family defined by the same formula in the non-archimedean case ? In this case, because of the ultra-metric inequality, two disks are either disjoint or equal. Hence the number of {0/1}-patterns is bounded by {n+1}, and the VC codensity is {1}.

    Epilogue

    I hope I have been able to convey the breadth of mathematical topics that the notion of VC dimensions has connections to. It remains an extremely active topic. For example, here are two very recent combinatorial papers from the arXiv in which VC dimension plays a critical role.

    Bounded VC-dimension implies the Schur-Erdos conjecture, Jacob Fox, Janos Pach, Andrew Suk

    An Efficient Algorithm for Generalized Polynomial Partitioning and Its Applications. Pankaj K. Agarwal, Boris Aronov, Esther Ezra, Joshua Zahl

    There are many open questions — some directly related and some unrelated to the the topics discussed above.

    One immediate open problem would be to extend the “cohomological method” to the {p}-adics. The problem here is that the theory of {\mathbb{Q}_p} admits quantifier elimination only in an extended language (using the so called Macintyre’s predicates for example)– and thus not all definable subsets of {\mathbb{Q}_p^m} can be defined by quantifier-free formulas with atoms of the form {|F| \leq \lambda\cdot|G|}.

    A more open ended problem (to which I will devote another post) is to “integrate” cohomology more directly inside the model theoretic framework. Some idea of what I have in mind can be found here (joint work with Deepam Patel).

 

Standard

Categorical Complexity 1

Introduction and motivation

Computational complexity is a mathematical topic that is studied by theoretical computer scientists with mathematical rigor. A central problem of the area is the famous \textbf{P} vs \textbf{NP} problem. However, (from personal experience) a typical run-of-the-mill “working mathematician” (say working in algebraic geometry or pde’s) often has a somewhat hazy picture of what the field is about. Sure, sometimes a statement such as “such or such problem admits an efficient algorithm or is NP-hard” occurs as a “remark” in a couple of their papers — but often such “remarks”  are carefully separated from the “proper mathematics” in the paper.  There are some good reasons for this state of affairs — and I allude to them later.

The motivation behind this post is two-fold:

  1. To convince “the working mathematician” or perhaps “the working category theorist” that “complexity” (removed from the standard paradigm of  Turing machines and “membership problems in languages”) can be an interesting notion worth their attention.
  2. To argue that it is possible to define complexity of mathematical objects in a natural (categorical) way  (think what is the “complexity”  of your favorite variety or scheme or manifold or even a morphism between objects of these categories ?) — and  the manner in which  such “complexity” transforms under your favorite functors bear some relationship (by analogy) with the classical questions such as \textbf{P}   vs \textbf{NP} .  Such functor complexity can also be interesting in its own right — and indeed has been studied extensively in particular cases, (even in my own little corner of math, namely real algebraic geometry) though not using the categorical language. 

Standard disclaimers:

In what follows I assume no knowledge of complexity theory, but assume basic familiarity with some standard notions from category theory.

It goes without saying that the views expressed are mine and may not reflect those of my co-authors/co-conspirators. 

(Classical) Complexity theory in a nutshell : P, NP, PH and all that

Computational complexity studies complexity of languages.  In its original avatar, a language L is a set of 0/1 strings — that is L is a subset of  \bigcup_{n \geq 0} \{0,1\}^n .  (This emphasis on  0/1 or “bits” has to do obviously with digital computers — and we will escape from this tyranny of bits and machines at the first opportunity.)  But staying within the “bit framework” for the moment —  one says that the complexity of a language L is bounded by a function f(n) if there exists a Turing machine M , such that for every n >0 the number of steps M takes to decide if an element w \in \{0,1\}^n belongs to L is bounded by f(n) . The language L belongs to the class \textbf{P} if and only if  the function f can be taken to be a polynomial.

The definition of the class \textbf{P}   is quite robust. For instance, the precise details of the definition of  “Turing machine”  is not very important. One can replace the notion of a Turing machine by any reasonable programming language without changing the definition of the class \textbf{P} . There is also a “non-uniform” version of the class \textbf{P} (which is (provably) not equal to the class \textbf{P} described before) using circuits (rather than Turing machines) — the non-uniformity referring to the fact that the circuits for different n are independent of each other.  (Warning. The word “circuit” here does not refer to the minimally dependent sets in a matroid, but rather to ones found in digital computers — with “gates” encoding logical connectives and “wires” feeding into and out of such gates).

Let us denote the subset L \cap \{0,1\}^n by L_n . One says that a language  L belongs to the class \textbf{NP} , if and only if there exists a language L' belonging to the class \textbf{P} , and a polynomial m(n) , such that L_n = \pi_n(L'_{m(n)+n}) , where \pi_n denotes the projection on the last n factors. Thus, each language in  \textbf{NP} is an image under projection (strictly speaking a sequence of projections)  of languages in  \textbf{P} . Equivalently, they  can be described using one block (of size typically growing with n ) of existential quantifiers  applied to a language in \textbf{P} — eliminating a block of existential quantifiers is expressing in the language of logic,  the geometric act of taking image under projection along certain variables. The “\textbf{N} ”  in \textbf{NP} has to do with “non-deterministic” Turing machines —  \exists \textbf{P} is more intuitive (but not standard !). Replacing \exists by the \forall one obtains the class traditionally called \textbf{co-NP} (one should perhaps call it \forall \textbf{P} ). Indeed by alternating the existential and the universal quantifiers, one obtains an hierarchy of complexity classes (in the same spirit as the arithmetic hierarchy of Kleene in logic) called the polynomial hierarchy (denoted \textbf{PH} — no quibble with nomenclature here). It is unknown whether this is a proper hierarchy or it  collapses down to \textbf{P} (or to some other finite level).

But I promised that we will escape the tyranny of  “bits” and “machines”  — which we do now (one morsel at a time) starting with bits first.

Escaping the tyranny of “bits” (Blum-Shub-Smale, Poizat)

In order to “model” continuous computations (for example, those arising in numerical analysis) Blum, Shub and Smale (1989)  defined rigorously a notion of a computing machine that took as input real numbers (also complex numbers). In fact, what they defined is a machine capable of accepting inputs which are tuples of finite lengths with entries from a real closed field \mathrm{R} or an algebraically closed field \mathrm{C} . Letting  k denoting either \mathrm{R} or \mathrm{C} , in analogy with the classical case the language accepted by a Blum-Shub-Smale (henceforth B-S-S) machine over k , is a subset of \bigcup_{n >0} k^n .

The, definition of when such a language L is in the class \textbf{P}_k (subscript k denoting the field) is now the same as in the classical case. If L belongs to the class \textbf{P}_k , then there is a B-S-S machine accepting L with polynomially bounded steps (and so in particular such a machine terminates in finitely many steps on every possible input). This last observation implies that (the n -th part), L_n , of L is a semi-algebraic subset of k^n if k = \mathrm{R} , or a constructible subset of k^n if k = \mathrm{C} .

Blum-Shub-Smale went further. Once we have the class \textbf{P}_k , the definitions of \textbf{NP}_k,  \textbf{co-NP}_k, \textbf{PH}_k , are now automatic (using existential/universal quantifiers as in the classical case). Blum-Shub-Smale proved the existence of  \textbf{NP}_k\textbf{-complete} problems etc.  In a similar spirit, but from a more model-theoretic point of view, Bruno Poizat generalized Blum-Shub-Smale results to more general structures (B. Poizat, Les petits cailloux.  Une approche modèle-théorique de l’algorithmie.  1995).

The Blum-Shub-Smale version of the \textbf{P}_k vs \textbf{NP}_k for k = \mathbb{R},\mathbb{C} begins to resemble problems more familiar to run-of-the-mill mathematicians — because they boil down to the question whether the images under projections of  “easy” semi-algebraic/constructible subsets remain “easy” — or are provably “hard”. Here “easy/hard” refers to belonging/not belonging  to the class \textbf{P}_k . Over the reals, one would prove \textbf{P}_{\mathbb{R}} \neq \textbf{NP}_{\mathbb{R}} , if one could show that the sequence of real quartic polynomials in n variables  (i.e the tuples of their coefficient vectors) admitting a real solution is not in  \textbf{P}_{\mathbb{R}} . Not that it has helped till date, but certainly thinking about discriminant varieties over real closed or algebraically closed fields — these varieties appear naturally when considering systems of polynomial equations having common zeros — are easier than thinking of images of projections of subsets of the Boolean hypercube.

Escaping the tyranny of “machines” (diagram computations)

But a certain annoyance remains. Firstly, as we are taught in (abstract) linear algebra, one should not really think of polynomials (for example the quartic polynomials of the last paragraph) as a tuple of coefficients — because this amounts to choosing arbitrarily a basis for the vector space of polynomials. Rather, a “polynomial in n variables”,  is an element of \textbf{Sym} \; V^* where V is an n -dimensional vector space. The discriminant varieties that arise have geometric descriptions — which do not require fixing any bases. Wouldn’t it be nice to have a basis-free definition of “complexity” as well ?

And then there is still the role played by “machines/circuits” in the definition of the all important class \textbf{P}_k . All these models fit into the mould of “input/output framework”  — which seem to force choice of a basis.

Categorical complexity

We  (Umut Isik and I) propose an alternative in this paper. We delink complexity from membership questions, and take a categorical viewpoint — and aim at defining a notion of “complexity” — not just of objects, or arrows, but of arbitrary finite diagrams in any given category. Here is the basic idea. To see the details one needs to look at the paper. Fix a category C (for example, the category of k -vector spaces), and a set of morphisms \mathcal{A} of C which we call the set of basic morphisms (in the case of k -vector spaces, we take \mathcal{A} to be the set of homotheties k \xrightarrow{c} k,c \in k , the zero morphisms \mathbf{0} \rightarrow k, k \rightarrow \mathbf{0} , along with the morphism k \times k \xrightarrow{+} k ). Each morphism in \mathcal{A} is given unit cost. We now define a way of assigning cost to arbitrary (finite) diagrams.

A natural way in any category to build more complicated objects and morphisms from simpler ones in categories is to take limits or colimits of diagrams. This motivates our definition of what we call diagram computation  in our paper. At each step of such a “computation” one is allowed to take a limit or colimit of a sub-diagram of the existing diagram (if it exists in the category C )  and thus obtain a larger diagram. The categorical complexity then of an arbitrary diagram is the minimum number of such steps required + the size of the initial diagram, to produce a diagram in which a copy of the given diagram occurs as a subdiagram. While the choice of the initial set of basic morphism might be seen as arbitrary,  there are no other choices involved — and limits/colimits being only defined up to isomorphisms — isomorphic diagrams/objects/morphisms automatically have identical complexity. Restricting diagram computations to those which only use limits (resp. colimits), we have a notion of (categorical) limit (resp. colimit) complexity as well.

Just to stay grounded, here is an example (which appears in our paper) of a diagram computation in the category k -vector space using only limits which computes the morphism k^3 \rightarrow k^2, (x,y,z) \mapsto (2x + 2y + 3z, y + z) . The cone of the final limit is displayed using a red curve.  This last limit is isomorphic to k^3 and one of the arrows out of the limit is the morphism we wanted to compute.

kvectcomputationwithlimits

This way of defining “complexity” might seem outlandish at first glance — and might seem to be very far removed from the usual machine-driven definitions of complexity — but we show that in certain standard categories — for example, category of affine k -schemes, the categorical complexity of say an affine sub-variety V \subset \mathbb{A}^n_k is closely related to the “circuit complexity” of the polynomials cutting out V . So the notion is not as outlandish as it might seem at first glance.

Complexity of functors

Once we have a well-defined notion of complexity for (diagrams in) categories, we can define complexity of functors. Given a functor F: C \rightarrow D , between two categories (each with its own set of basic morphisms), we define the complexity of the functor F to be the function, C_F(n) to the supremum over all diagrams \Delta  in C  of complexity bounded by n , the complexity in the category D of F(\Delta) .

We will see below how functor complexity impinges on several aspects of categorical theory — for instance categorical analogs of \textbf{P} vs \textbf{N} . However,  complexity of functors arise (informally) in may different guises — often when one proves “quantitative bounds”.

For example, in applications of real algebraic geometry an important role played is played by results bounding the Betti numbers of real semi-algebraic sets in terms of the size of the formula defining the set (often measured by the sum of the degrees of the polynomials appearing in the formula). The first results in this direction was proved by  Oleĭnik and  Petrovskiĭ (On the topology of real algebraic surfaces. (Russian)  Doklady Akad. Nauk SSSR (N.S.) 67, (1949) 31–32), and later generalized in various ways (for example, see here). These bounds are not in the framework of categorical complexity.

However, if one fixes a notion of categorical complexity for the category \mathrm{SA}  of semi-algebraic sets, and also for \mathbb{Z} -modules — then one can formally ask for the complexity of the homology functor \mathrm{H}_*: \mathrm{SA} \rightarrow \mathbb{Z}\mbox{-mod} . The known results on upper bound on the Betti numbers of semi-algebraic sets in terms of the number and degrees of the defining polynomials do not immediately produce an upper bound on the functor complexity of \mathrm{H}_*: \mathrm{SA} \rightarrow \mathbb{Z}\mbox{-mod} . Thus, to obtain a tight upper bound on this functor complexity is an open problem.

Role of adjoint functors

Category theory will not be so interesting without the notion of adjoint functors — so it is not surprising that they should play a role in categorical complexity as well. For one thing, one knows that right adjoints preserve limits and left ones preserve colimits. As a result, right adjoint functors will preserve (upper bounds on) limit and left adjoints preserve (upper bounds on) colimit complexity (assuming that the functor takes basic morphisms to basic morphisms). But the role of adjoints is even visible at the level of classical complexity. Recall that classes \textbf{NP}, \textbf{co-NP} were related to the class \textbf{P} through a projection map \pi (we are being loose here and forgetting about the fact that we are talking about sequences of sets and not just one set). Let X, Y be sets and \pi: X \rightarrow Y (you can think of k^{m+n}, Y = k^n , \pi the projection to the last n factors. Now the map \pi induces three maps

  • \pi_\exists: 2^X \rightarrow 2^Y
  • \pi^*: 2^Y \rightarrow 2^X
  • \pi_\forall: 2^X \rightarrow 2^Y

The first and the second one are the direct image and the inverse image maps induced by \pi , while the third map \pi_\forall is defined by \pi_\forall(A) =\{y \in Y \mid A \subset \pi^{-1}(y)\} . Thinking of 2^X and 2^Y as poset categories (partial order corresponding to inclusion), the maps \pi_\exists, \pi_*,\pi_\forall correspond to functors, and \pi_\exists is left adjoint to \pi^* which is left adjoint to \pi_\forall (Exercise!).

The connection with complexity is this. It is easy to prove that unlike taking direct image, the pull-back under a projection of an “easy”  language is again “easy”. Thus, the complexity class \textbf{P} is preserved under pull-backs of projections. In other words given an “easy”  set B \in 2^Y , \pi^*(B) is again “easy”. But the question of whether given an “easy”  set A \in 2^X , whether \pi_\exists(A),\pi_\forall(A) is again easy, is related to the \textbf{P} = \textbf{NP} and \textbf{P} = \textbf{co-NP} respectively.

In category theory, functors often come in adjoint pairs (and often one of the pair is more interesting than the other). The same thing seems to happen in the world of complexity of functors — often in an adjoint pair one functor has polynomially bounded complexity, while the other does not (conjecturally). More here.

Complexity of the image functor

A particular case of functor complexity (where adjointness also plays a role) is of particular interest. Let C be a category and let C^{\bullet \rightarrow \bullet} denote the diagram category corresponding to the diagram \bullet \rightarrow \bullet . Its objects are morphisms in the category C . Let \mathrm{Mon}(C) denote the subcategory of C^{\bullet \rightarrow \bullet} which are monomorphism, and let i_C: \mathrm{Mon}(C) \rightarrow C^{\bullet \rightarrow \bullet} denote the inclusion functor. One says that the category C has images if the the functor i_C has an adjoint (which if it exists we denote by \mathrm{im}_C ). For example, the category of sets, groups, or the category \mathrm{SA} of semi-algebraic sets and maps,  have images. This is because for every morphism f: A \rightarrow B in these categories,  there exists a monomorphism \mathrm{im}(f) \rightarrowtail B (for the category \mathrm{SA} this fact is a consequence of the Tarski-Seidenberg theorem which guarantees that the image of a semi-algebraic map is semi-algebraic).

If one has a category having images, then starting from the definition of complexity in the category C one obtains naturally a definition of complexity in the categories C^{\bullet \rightarrow \bullet} and \mathrm{Mon}(C) .  This in turn determines the complexity of the functor \mathrm{im}_C . The functor complexity of \mathrm{im}_C is a function — and it is meaningful to ask if this function is polynomially bounded. Remember  our discussion about \textbf{P} and \textbf{NP} in various settings.  We saw that question whether \textbf{P} =  \textbf{NP} is fundamentally about the extent to which complexity of an object increases under taking the images of the object under certain maps.  This blow-up in complexity in the world of categorical complexity is  measured by the complexity of the image functor (if it exists). It is thus natural to posit that the question of whether the complexity of the image functor in a category C is polynomially bounded or not is the categorical analog of \textbf{P} =  \textbf{NP} .

Postscript and Quo Vadis

Our motivation in developing categorical complexity has to do more in developing a proper framework to study “complexity” of  mathematical objects and functors — rather than using it as a tool for proving actual lower bounds interesting to computer scientists (even though I will not be unhappy if categorical insights contribute to that goal). Also there is some parallel work. For instance, Noson Yanofsky develops a somewhat similar notion of computation using Kan extensions (of which limits and colimits are special cases). But his goals seem to be different from ours — as we want to stay closer to everyday mathematics and functor complexity does not seem to as central in his work as in ours. There are many questions that we list on the paper. We hope to answer some of them in the future, but I will be happier if it attracts the attention of category theorists to complexity theory seen from a categorical point of view.

Finally, one anonymous referee asked : “Can lower bound techniques like diagonalization, relativization, and natural proofs be carried over to categorical complexity ? ” I have nothing to say about this at this point — but food for future thought.

Standard