-
Universal Perfect Samplers for Incremental Streams
Authors:
Seth Pettie,
Dingyu Wang
Abstract:
If $G : \mathbb{R}_+ \to \mathbb{R}_+$, the $G$-moment of a vector $\mathbf{x}\in\mathbb{R}_+^n$ is $G(\mathbf{x}) = \sum_{v\in[n]} G(\mathbf{x}(v))$ and the $G$-sampling problem is to select an index $v_*\in [n]$ according to its contribution to the $G$-moment, i.e., such that $\Pr(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$. Approximate $G$-samplers may introduce multiplicative and/or additive erro…
▽ More
If $G : \mathbb{R}_+ \to \mathbb{R}_+$, the $G$-moment of a vector $\mathbf{x}\in\mathbb{R}_+^n$ is $G(\mathbf{x}) = \sum_{v\in[n]} G(\mathbf{x}(v))$ and the $G$-sampling problem is to select an index $v_*\in [n]$ according to its contribution to the $G$-moment, i.e., such that $\Pr(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$. Approximate $G$-samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure.
In this paper we focus on the exact $G$-sampling problem, where $G$ is selected from the class $\mathcal{G}$ of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as $p$th moments $G(z)=z^p$, $p\in[0,1]$, logarithms $G(z)=\log(1+z)$, Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop $G$-samplers for a vector $\mathbf{x} \in \mathbb{R}_+^n$ that is presented as an incremental stream of positive updates. In particular:
* For any $G\in\mathcal{G}$, we give a very simple $G$-sampler that uses 2 words of memory and stores at all times a $v_*\in [n]$, such that $\Pr(v_*=v)$ is exactly $G(\mathbf{x}(v))/G(\mathbf{x})$.
* We give a ``universal'' $\mathcal{G}$-sampler that uses $O(\log n)$ words of memory w.h.p., and given any $G\in \mathcal{G}$ at query time, produces an exact $G$-sample.
With an overhead of a factor of $k$, both samplers can be used to $G$-sample a sequence of $k$ indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
A Refutation of the Pach-Tardos Conjecture for 0-1 Matrices
Authors:
Seth Pettie,
Gábor Tardos
Abstract:
The theory of forbidden 0-1 matrices generalizes Turan-style (bipartite) subgraph avoidance, Davenport-Schinzel theory, and Zarankiewicz-type problems, and has been influential in many areas, such as discrete and computational geometry, the analysis of self-adjusting data structures, and the development of the graph parameter twin width.
The foremost open problems in this area is to resolve the…
▽ More
The theory of forbidden 0-1 matrices generalizes Turan-style (bipartite) subgraph avoidance, Davenport-Schinzel theory, and Zarankiewicz-type problems, and has been influential in many areas, such as discrete and computational geometry, the analysis of self-adjusting data structures, and the development of the graph parameter twin width.
The foremost open problems in this area is to resolve the Pach-Tardos conjecture from 2005, which states that if a forbidden pattern $P\in\{0,1\}^{k\times l}$ is the bipartite incidence matrix of an acyclic graph (forest), then $\mathrm{Ex}(P,n) = O(n\log^{C_P} n)$, where $C_P$ is a constant depending only on $P$. This conjecture has been confirmed on many small patterns, specifically all $P$ with weight at most 5, and all but two with weight 6.
The main result of this paper is a clean refutation of the Pach-Tardos conjecture. Specifically, we prove that $\mathrm{Ex}(S_0,n),\mathrm{Ex}(S_1,n) \geq n2^{Ω(\sqrt{\log n})}$, where $S_0,S_1$ are the outstanding weight-6 patterns. We also prove sharp bounds on the entire class of alternating patterns $(P_t)$, specifically that for every $t\geq 2$, $\mathrm{Ex}(P_t,n)=Θ(n(\log n/\log\log n)^t)$. This is the first proof of an asymptotically sharp bound that is $ω(n\log n)$.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Sorting Pattern-Avoiding Permutations via 0-1 Matrices Forbidding Product Patterns
Authors:
Parinya Chalermsook,
Seth Pettie,
Sorrachai Yingchareonthawornchai
Abstract:
We consider the problem of comparison-sorting an $n$-permutation $S$ that avoids some $k$-permutation $��$. Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak prove that when $S$ is sorted by inserting the elements into the GreedyFuture binary search tree, the running time is linear in the extremal function $\mathrm{Ex}(P_π\otimes \text{hat},n)$. This is the maximum number of 1s in an…
▽ More
We consider the problem of comparison-sorting an $n$-permutation $S$ that avoids some $k$-permutation $π$. Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak prove that when $S$ is sorted by inserting the elements into the GreedyFuture binary search tree, the running time is linear in the extremal function $\mathrm{Ex}(P_π\otimes \text{hat},n)$. This is the maximum number of 1s in an $n\times n$ 0-1 matrix avoiding $P_π\otimes \text{hat}$, where $P_π$ is the $k\times k$ permutation matrix of $π$, $\otimes$ the Kronecker product, and $\text{hat} = \left(\begin{array}{ccc}&\bullet&\\\bullet&&\bullet\end{array}\right)$. The same time bound can be achieved by sorting $S$ with Kozma and Saranurak's SmoothHeap.
In this paper we give nearly tight upper and lower bounds on the density of $P_π\otimes\text{hat}$-free matrices in terms of the inverse-Ackermann function $α(n)$. \[ \mathrm{Ex}(P_π\otimes \text{hat},n) = \left\{\begin{array}{ll} Ω(n\cdot 2^{α(n)}), & \mbox{for most $π$,}\\ O(n\cdot 2^{O(k^2)+(1+o(1))α(n)}), & \mbox{for all $π$.} \end{array}\right. \] As a consequence, sorting $π$-free sequences can be performed in $O(n2^{(1+o(1))α(n)})$ time. For many corollaries of the dynamic optimality conjecture, the best analysis uses forbidden 0-1 matrix theory. Our analysis may be useful in analyzing other classes of access sequences on binary search trees.
△ Less
Submitted 8 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
On the Extremal Functions of Acyclic Forbidden 0-1 Matrices
Authors:
Seth Pettie,
Gábor Tardos
Abstract:
The extremal theory of forbidden 0-1 matrices studies the asymptotic growth of the function $\mathrm{Ex}(P,n)$, which is the maximum weight of a matrix $A\in\{0,1\}^{n\times n}$ whose submatrices avoid a fixed pattern $P\in\{0,1\}^{k\times l}$. This theory has been wildly successful at resolving problems in combinatorics, discrete and computational geometry, structural graph theory, and the analys…
▽ More
The extremal theory of forbidden 0-1 matrices studies the asymptotic growth of the function $\mathrm{Ex}(P,n)$, which is the maximum weight of a matrix $A\in\{0,1\}^{n\times n}$ whose submatrices avoid a fixed pattern $P\in\{0,1\}^{k\times l}$. This theory has been wildly successful at resolving problems in combinatorics, discrete and computational geometry, structural graph theory, and the analysis of data structures, particularly corollaries of the dynamic optimality conjecture.
All these applications use acyclic patterns, meaning that when $P$ is regarded as the adjacency matrix of a bipartite graph, the graph is acyclic. The biggest open problem in this area is to bound $\mathrm{Ex}(P,n)$ for acyclic $P$. Prior results have only ruled out the strict $O(n\log n)$ bound conjectured by Furedi and Hajnal. It is consistent with prior results that $\forall P. \mathrm{Ex}(P,n)\leq n\log^{1+o(1)} n$, and also consistent that $\forall ε>0.\exists P. \mathrm{Ex}(P,n) \geq n^{2-ε}$.
In this paper we establish a stronger lower bound on the extremal functions of acyclic $P$. Specifically, we give a new construction of relatively dense 0-1 matrices with $Θ(n(\log n/\log\log n)^t)$ 1s that avoid an acyclic $X_t$. Pach and Tardos have conjectured that this type of result is the best possible, i.e., no acyclic $P$ exists for which $\mathrm{Ex}(P,n)\geq n(\log n)^{ω(1)}$.
△ Less
Submitted 3 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection
Authors:
Shang-En Huang,
Seth Pettie,
Leqi Zhu
Abstract:
Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected laten…
▽ More
Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{Θ(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions.
In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively.
Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Optimal Vertex Connectivity Oracles
Authors:
Seth Pettie,
Thatchaphol Saranurak,
Longhui Yin
Abstract:
A $k$-vertex connectivity oracle for undirected $G$ is a data structure that, given $u,v\in V(G)$, reports $\min\{k,κ(u,v)\}$, where $κ(u,v)$ is the pairwise vertex connectivity between $u,v$. There are three main measures of efficiency: construction time, query time, and space. Prior work of Izsak and Nutov shows that a data structure of total size $\tilde{O}(kn)$ can even be encoded as a…
▽ More
A $k$-vertex connectivity oracle for undirected $G$ is a data structure that, given $u,v\in V(G)$, reports $\min\{k,κ(u,v)\}$, where $κ(u,v)$ is the pairwise vertex connectivity between $u,v$. There are three main measures of efficiency: construction time, query time, and space. Prior work of Izsak and Nutov shows that a data structure of total size $\tilde{O}(kn)$ can even be encoded as a $\tilde{O}(k)$-bit labeling scheme so that vertex-connectivity queries can be answered in $\tilde{O}(k)$ time. The construction time is polynomial, but unspecified.
In this paper we address the top three complexity measures: Space, Query Time, and Construction Time. We give an $Ω(kn)$-bit lower bound on any vertex connectivity oracle. We construct an optimal-space connectivity oracle in max-flow time that answers queries in $O(\log n)$ time, independent of $k$.
△ Less
Submitted 2 January, 2022;
originally announced January 2022.
-
The Structure of Minimum Vertex Cuts
Authors:
Seth Pettie,
Longhui Yin
Abstract:
In this paper we continue a long line of work on representing the cut structure of graphs. We classify the types minimum vertex cuts, and the possible relationships between multiple minimum vertex cuts.
As a consequence of these investigations, we exhibit a simple $O(κn)$-space data structure that can quickly answer pairwise $(κ+1)$-connectivity queries in a $κ$-connected graph. We also show how…
▽ More
In this paper we continue a long line of work on representing the cut structure of graphs. We classify the types minimum vertex cuts, and the possible relationships between multiple minimum vertex cuts.
As a consequence of these investigations, we exhibit a simple $O(κn)$-space data structure that can quickly answer pairwise $(κ+1)$-connectivity queries in a $κ$-connected graph. We also show how to compute the "closest" $κ$-cut to every vertex in near linear $\tilde{O}(m+poly(κ)n)$ time.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Improved bounds for multipass pairing heaps and path-balanced binary search trees
Authors:
Dani Dorfman,
Haim Kaplan,
László Kozma,
Seth Pettie,
Uri Zwick
Abstract:
We revisit multipass pairing heaps and path-balanced binary search trees (BSTs), two classical algorithms for data structure maintenance. The pairing heap is a simple and efficient "self-adjusting" heap, introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. In the multipass variant (one of the original pairing heap variants described by Fredman et al.) the minimum item is extracted via re…
▽ More
We revisit multipass pairing heaps and path-balanced binary search trees (BSTs), two classical algorithms for data structure maintenance. The pairing heap is a simple and efficient "self-adjusting" heap, introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. In the multipass variant (one of the original pairing heap variants described by Fredman et al.) the minimum item is extracted via repeated pairing rounds in which neighboring siblings are linked.
Path-balanced BSTs, proposed by Sleator (Subramanian, 1996), are a natural alternative to Splay trees (Sleator and Tarjan, 1983). In a path-balanced BST, whenever an item is accessed, the search path leading to that item is re-arranged into a balanced tree.
Despite their simplicity, both algorithms turned out to be difficult to analyse. Fredman et al. showed that operations in multipass pairing heaps take amortized $O(\log{n} \cdot \log\log{n} / \log\log\log{n})$ time. For searching in path-balanced BSTs, Balasubramanian and Raman showed in 1995 the same amortized time bound of $O(\log{n} \cdot \log\log{n} / \log\log\log{n})$, using a different argument.
In this paper we show an explicit connection between the two algorithms and improve the two bounds to $O\left(\log{n} \cdot 2^{\log^{\ast}{n}} \cdot \log^{\ast}{n}\right)$, respectively $O\left(\log{n} \cdot 2^{\log^{\ast}{n}} \cdot (\log^{\ast}{n})^2 \right)$, where $\log^{\ast}(\cdot)$ denotes the very slowly growing iterated logarithm function. These are the first improvements in more than three, resp. two decades, approaching in both cases the information-theoretic lower bound of $Ω(\log{n})$.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
Lower Bounds on Davenport-Schinzel Sequences via Rectangular Zarankiewicz Matrices
Authors:
Julian Wellman,
Seth Pettie
Abstract:
An order-$s$ Davenport-Schinzel sequence over an $n$-letter alphabet is one avoiding immediate repetitions and alternating subsequences with length $s+2$. The main problem is to determine the maximum length of such a sequence, as a function of $n$ and $s$. When $s$ is fixed this problem has been settled but when $s$ is a function of $n$, very little is known about the extremal function $λ(s,n)$ of…
▽ More
An order-$s$ Davenport-Schinzel sequence over an $n$-letter alphabet is one avoiding immediate repetitions and alternating subsequences with length $s+2$. The main problem is to determine the maximum length of such a sequence, as a function of $n$ and $s$. When $s$ is fixed this problem has been settled but when $s$ is a function of $n$, very little is known about the extremal function $λ(s,n)$ of such sequences.
In this paper we give a new recursive construction of Davenport-Schinzel sequences that is based on dense 0-1 matrices avoiding large all-1 submatrices (aka Zarankiewicz's Problem.) In particular, we give a simple construction of $n^{2/t} \times n$ matrices containing $n^{1+1/t}$ 1s that avoid $t\times 2$ all-1 submatrices.
Our lower bounds on $λ(s,n)$ exhibit three qualitatively different behaviors depending on the size of $s$ relative to $n$. When $s \le \log\log n$ we show that $λ(s,n)/n \ge 2^s$ grows exponentially with $s$. When $s = n^{o(1)}$ we show $λ(s,n)/n \ge (\frac{s}{2\log\log_s n})^{\log\log_s n}$ grows faster than any polynomial in $s$. Finally, when $s=Ω(n^{1/t}(t-1)!)$, $λ(s,n) = Ω(n^2 s/(t-1)!)$ matches the trivial upper bound $O(n^2s)$ asymptotically, whenever $t$ is constant.
△ Less
Submitted 30 October, 2016;
originally announced October 2016.
-
A Hierarchy of Lower Bounds for Sublinear Additive Spanners
Authors:
Amir Abboud,
Greg Bodwin,
Seth Pettie
Abstract:
Spanners, emulators, and approximate distance oracles can be viewed as lossy compression schemes that represent an unweighted graph metric in small space, say $\tilde{O}(n^{1+δ})$ bits. There is an inherent tradeoff between the sparsity parameter $δ$ and the stretch function $f$ of the compression scheme, but the qualitative nature of this tradeoff has remained a persistent open problem.
In this…
▽ More
Spanners, emulators, and approximate distance oracles can be viewed as lossy compression schemes that represent an unweighted graph metric in small space, say $\tilde{O}(n^{1+δ})$ bits. There is an inherent tradeoff between the sparsity parameter $δ$ and the stretch function $f$ of the compression scheme, but the qualitative nature of this tradeoff has remained a persistent open problem.
In this paper we show that the recent additive spanner lower bound of Abboud and Bodwin is just the first step in a hierarchy of lower bounds that fully characterize the asymptotic behavior of the optimal stretch function $f$ as a function of $δ\in (0,1/3)$. Specifically, for any integer $k\ge 2$, any compression scheme with size $O(n^{1+\frac{1}{2^k-1} - ε})$ has a sublinear additive stretch function $f$: $$f(d) = d + Ω(d^{1-\frac{1}{k}}).$$ This lower bound matches Thorup and Zwick's (2006) construction of sublinear additive emulators. It also shows that Elkin and Peleg's $(1+ε,β)$-spanners have an essentially optimal tradeoff between $δ,ε,$ and $β$, and that the sublinear additive spanners of Pettie (2009) and Chechik (2013) are not too far from optimal.
To complement these lower bounds we present a new construction of $(1+ε, O(k/ε)^{k-1})$-spanners with size $O((k/ε)^{h_k} kn^{1+\frac{1}{2^{k+1}-1}})$, where $h_k < 3/4$. This size bound improves on the spanners of Elkin and Peleg (2004), Thorup and Zwick (2006), and Pettie (2009). According to our lower bounds neither the size nor stretch function can be substantially improved.
△ Less
Submitted 10 January, 2017; v1 submitted 25 July, 2016;
originally announced July 2016.
-
Connectivity Oracles for Graphs Subject to Vertex Failures
Authors:
Ran Duan,
Seth Pettie
Abstract:
We introduce new data structures for answering connectivity queries in graphs subject to batched vertex failures. A deterministic structure processes a batch of $d\leq d_{\star}$ failed vertices in $\tilde{O}(d^3)$ time and thereafter answers connectivity queries in $O(d)$ time. It occupies space $O(d_{\star} m\log n)$. We develop a randomized Monte Carlo version of our data structure with update…
▽ More
We introduce new data structures for answering connectivity queries in graphs subject to batched vertex failures. A deterministic structure processes a batch of $d\leq d_{\star}$ failed vertices in $\tilde{O}(d^3)$ time and thereafter answers connectivity queries in $O(d)$ time. It occupies space $O(d_{\star} m\log n)$. We develop a randomized Monte Carlo version of our data structure with update time $\tilde{O}(d^2)$, query time $O(d)$, and space $\tilde{O}(m)$ for any failure bound $d\le n$. This is the first connectivity oracle for general graphs that can efficiently deal with an unbounded number of vertex failures.
We also develop a more efficient Monte Carlo edge-failure connectivity oracle. Using space $O(n\log^2 n)$, $d$ edge failures are processed in $O(d\log d\log\log n)$ time and thereafter, connectivity queries are answered in $O(\log\log n)$ time, which are correct w.h.p.
Our data structures are based on a new decomposition theorem for an undirected graph $G=(V,E)$, which is of independent interest. It states that for any terminal set $U\subseteq V$ we can remove a set $B$ of $|U|/(s-2)$ vertices such that the remaining graph contains a Steiner forest for $U-B$ with maximum degree $s$.
△ Less
Submitted 6 September, 2017; v1 submitted 22 July, 2016;
originally announced July 2016.
-
Three Generalizations of Davenport-Schinzel Sequences
Authors:
Seth Pettie
Abstract:
We present new, and mostly sharp, bounds on the maximum length of certain generalizations of Davenport-Schinzel sequences. Among the results are sharp bounds on order-$s$ {\em double DS} sequences, for all $s$, sharp bounds on sequences avoiding {\em catenated permutations} (aka formation free sequences), and new lower bounds on sequences avoiding {\em zig-zagging} patterns.
We present new, and mostly sharp, bounds on the maximum length of certain generalizations of Davenport-Schinzel sequences. Among the results are sharp bounds on order-$s$ {\em double DS} sequences, for all $s$, sharp bounds on sequences avoiding {\em catenated permutations} (aka formation free sequences), and new lower bounds on sequences avoiding {\em zig-zagging} patterns.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.
-
Sharp Bounds on Davenport-Schinzel Sequences of Every Order
Authors:
Seth Pettie
Abstract:
One of the longest-standing open problems in computational geometry is to bound the lower envelope of $n$ univariate functions, each pair of which crosses at most $s$ times, for some fixed $s$. This problem is known to be equivalent to bounding the length of an order-$s$ Davenport-Schinzel sequence, namely a sequence over an $n$-letter alphabet that avoids alternating subsequences of the form…
▽ More
One of the longest-standing open problems in computational geometry is to bound the lower envelope of $n$ univariate functions, each pair of which crosses at most $s$ times, for some fixed $s$. This problem is known to be equivalent to bounding the length of an order-$s$ Davenport-Schinzel sequence, namely a sequence over an $n$-letter alphabet that avoids alternating subsequences of the form $a \cdots b \cdots a \cdots b \cdots$ with length $s+2$. These sequences were introduced by Davenport and Schinzel in 1965 to model a certain problem in differential equations and have since been applied to bounding the running times of geometric algorithms, data structures, and the combinatorial complexity of geometric arrangements.
Let $λ_s(n)$ be the maximum length of an order-$s$ DS sequence over $n$ letters. What is $λ_s$ asymptotically? This question has been answered satisfactorily (by Hart and Sharir, Agarwal, Sharir, and Shor, Klazar, and Nivasch) when $s$ is even or $s\le 3$. However, since the work of Agarwal, Sharir, and Shor in the mid-1980s there has been a persistent gap in our understanding of the odd orders.
In this work we effectively close the problem by establishing sharp bounds on Davenport-Schinzel sequences of every order $s$. Our results reveal that, contrary to one's intuition, $λ_s(n)$ behaves essentially like $λ_{s-1}(n)$ when $s$ is odd. This refutes conjectures due to Alon et al. (2008) and Nivasch (2010).
△ Less
Submitted 18 May, 2013; v1 submitted 4 April, 2012;
originally announced April 2012.